1 vs 1.0

tl;dr; Real numbers and integers are entirely different things. They’re often incorrectly equated because both are plotted on the same number line.

I used to think that I understood the difference, but I didn’t. This false feeling of comprehension was because there are two different differences at play:

The difference between real numbers and integers.
The difference between floating point numbers and reals.

I understood the second one, but not the first, until recently. This understanding doesn’t have great practical benefit, but it did bring me great intellectual joy, so I thought I’ll share my realisation.

So I knew that there were natural numbers, and from them arose integers, and from that folks made rational numbers, and from those folks made real numbers, and some folks didn’t just stop there, because why not, and they made complex numbers.

The thing I didn’t understand was that the jump from rational numbers to real numbers was not just a jump in quantity, it was a jump in kind too.

Real numbers are fundamentally a different thing than integers, and there is nothing they have in common except the fact that both can be plotted on the same high-school number line.

As mathematicians (or even non-mathematicians who paid attention in their set theory classes) would say, they are different sets. Well, so are integers and rational numbers, but the difference here is that integers and real numbers are entirely different kinds of sets (I think the mathematical term I'm circling around is cardinality).

When programming, one usually doesn’t care about these set-theoretical specifics. Of course, sometimes one care, say when dealing with monetary amounts, but usually in day-to-day code, these differences are more like irritants, dealt with by hapless ones like myself by inserting casts from say double to int here and there to shut the compiler up.

Doing these haphazard casting makes the code quite often not work in edge cases, but then one sits down and figures out the math and inserts the proper ceil or floor instead of doing the default rounding or truncating cast. And with that, all’s good in the world again.

The thing that doesn’t work though is comparing double values for equality. A vintage 1.0, travelling through a series of calculations, often ends back from its journey a battered 0.999.

This is a surprise the first time, but then one reads What Every Programmer Must Know About Floating Point Numbers (or some Stack Overflow answer that summarises it to a sentence), and figures out that floating point values are inexact, operation-order dependent representations of the actual, underlying, real number. Some times, say with a number like 𝜋, there aren’t even enough bits in the universe to represent the underlying real number, but that’s fine, 64 bits ought to be enough for anybody.

This is the second difference I mentioned above - the difference between floating point numbers and reals. I understood this, and this was my go to gotcha (that I never got to use) for unsuspecting noobs, asking them why a for loop summing up an array of floating point values returned different results if I reversed the array first.

Living life as a programmer with such understanding, I came across Haskell many years ago.

And did Haskell frustrate me in this regard. Integral, Int, Floating, Fractional, fromIntegral, Double, Natural, Num - there was a seemingly endless supply of number related concepts that didn’t seem to play well with each other.

I must admit, in my moments of weakness, I wondered why not just throw all this complexity away and do what JavaScript did - everything is just a “number” (a double). No integers or casting to take care of it. Simple, pragmatic and often more efficient (due to reduced conversions) code.

But I kept my fantasies to myself, and trundled on, inserting an fromIntegral here, a toRational there, to get GHCi to accept my meagre offerings of code.

Years later, when I had a bit of time, I sat down and read through the docs for all the Num instances, and found myself agreeing to the premises. The Num hierarchy made sense. And I found this comment in GHC source, which illustrated the kerfuffle involved in these conversions, and made me me see how the current approach was a good compromise:

Note [Optimising conversions between numeric types]

... We don't want to have to use one conversion function per pair of types as that would require N^2 functions.

... The following kind of class would allow us to have a single conversion function but at the price of N^2 instances and teh use of MultiParamTypeClasses extension.

... So what we do instead is that we use the Integer type (signed, unbounded) as a passthrough type to perform every conversion.

Clever.

But deep inside, it all seemed rather over engineered to me still. As in, yes, I see how they are different categories, but I just want a number mate.

My moment of realization came when (don’t ask me why) I was doing this thought experiment:

How long will it take for me to pick an integer if I keep picking random real numbers?

The answer is - never!

The realisation itself dawned in two phases. First was the immediate and ultimately flawed one: there are so infinitely many real numbers that if we randomly keep picking from them, there is never any chance of us hitting on a real number with the special property that it is also a nice round integer.

This sounds a bit suspicious when put to words - theoretically speaking, isn't there a, miniscule as it is, chance of us hitting on, say 1, if we keep picking real numbers between, say 0.9 and 1.1?

I don’t have the mathematical wherewithal (or certainty) to prove this. I think the framework that deals with such questions is probability theory, in particular the concept of a “measure”. But it appears in my mind as an obvious truth, that this probability is zero, even theoretically.

This is not a mathematical trick, I think it is rather to do with the nature of these infinities. There are infinitely more real numbers than there are integers. Our minds can, or can be trained to, deal with the infinity of integers, but dealing with the infinity of real numbers can melt our minds. Ask Cantor.

But later on, as a slow burner, came the second realisation - that the probability is zero not just because of measure theory, but because the integers just aren’t there! That is, integers are not a part of real numbers!

Given any integer, we can construct a real number as close to that integer on the number line as we want. That’s the sense in which integers and real numbers are “comparable” (maybe this isn’t the best word). And that’s why they can be put on the same number line.

But they are not equatable. I can get as close as I want, and then some, but I cannot ever reconstruct the integer, when I’m in the land of the reals.

Apologies if all this sounds rather long winded. I tried to mention afront that this was just my personal journey of understanding. I’m sure I still have some, maybe all, of the technicalities wrong. Based on feedback (e.g. see this comment), it seems that my conclusions are actually incorrect. And even if my conclusions were to be correct I’m certain that there are better expositions out there of what I’m trying to say here.

So what happened after I figured this?

Well, first of all I thanked the universe that there were people who knew more than me who had been tasked with creating and maintaining Haskell. They knew that integers and reals were different sets, and they shouldn’t be mixed, especially in a language that is trying to stand firmly on the grounds of set theory (on as firm a ground as it can provide).

Now that I see why Haskell has this intricate Num hierarchy, I find myself in a better mood when dealing with the artifacts of the how.

Practically, there is nothing I want to change. Things are not perfect: even after this oration of praise for the Haskell gods I still find myself muttering blasphemies when I open GHCi after a gap of time away from Haskell and find myself wanting to evaluate a simple mathematical expression, and the errors start piling in.

But I’m sure over time Haskell will evolve simpler ways of representing the Num hierarchy than what it currently has. In fact, if I use Haskell continuously for a while I find that these conversions aren't a problem.

I once actually wondered why (that these errors don't turn out to be as big a problem as they seem on first contact). I think it has to do with two factors:

In a real program, I usually provide types for the top level expressions. This tends to sort things out.
But more importantly, the real issue here is not the Num hierarchy, it is the incomprehensibility of the GHC error messages for a beginner.

I think, bar none, GHC is the most helpful, solid, compiler I have encountered, and its error messages are exemplary. The problem isn't with GHC, the problem is that the beginner mind (or me, when I reopen GHCi after having forgotten everything from my last affair with it) doesn’t have the requisite knowledge to understand what GHC is saying.

The solution, of course, as is with many things in Haskell, is to git gud, but in this particular case this illusive issue is compounded by the fact that both these two factors interact: A beginner is the one who is more likely to enter toy mathematical expressions in a GHCi prompt without any accompanying type bindings, and it is the beginner who is least qualified to understand what’s going on when GHC rebukes them for underspecifying what they want.

As I said, practically, when writing larger programs, I have yet encountered much need to use fromIntegral etc, and the problem sort of sorts itself out. Things can be better, of course, but it is no reason to throw away the important distinction between integers and reals like, say, JavaScript does.

On the flip side, do I think that JavaScript (or other similarly-stanced languages) should incorporate this distinction? No, what they have is fine too, and is better suited for the kind of work I do in them.

I’m happy that both exist, and have retained their differences.

Curiously enough, real numbers are not “closed”. There are primitive operations I can perform on real numbers that will give me back a thing that is not in the set of real numbers.

Let’s start by adding two numbers. Some smart ass would soon get bored of adding them again and again, and would say, hey why don’t we call this repeated addition as multiplication, and by the way, here’s a formula to do it in one shot.

Elated, the rest of us would get back to multiplying these numbers again and again, until the next smarter ass turns up and says, hey why don’t we call this repeated multiplication as exponentiation, and here’s a formula to do it in one shot too.

The curious bit we run into is that if we take a real number, say -1.0, and exponentiate it using another real number, say 0.5, we get back something that is not a real number. Instead we get back what can be thought of as a pair of real numbers.

Such pairs of real numbers are called complex numbers.

Even more curious (because of its arbitrariness) is the fact that complex numbers are closed. I can do things with two complex numbers and I'll always get back a complex number, not a new thing. I find it curious because it is like finding a hardcoded magic constant — it’s like god wrote an #define that 2 dimensional real numbers are the first dimension of real numbers that are closed. Why not 3 dimensional? Wouldn’t it have been less “hardcoded” if there was no specific N-dimensional real numbers that were closed under primitive operations, and they would’ve continued on generating new and new types?

While I could continue pondering on topics that I don’t understand ™️, this aside reminds me of some thing that I would indeed want changed: Complex Int in Haskell.

Recently I found myself needing complex numbers, but those with discrete integral components. These (as was the point of this entire post) are not the same thing as the actual, real valued, complex numbers, but they’re quite useful. They also have a name: like everything else in maths, they’re named after Gauss, and are called Gaussian Integers.

So it was with glee I saw in my local Haddock that GHC indeed ships with a Num a => Complex a type in the standard installation. It took a frustrating hour for this glee to turn into a sigh as I figured that Complex Int is useless when we actually try to do any operations on it (because the Num instance of Complex Int has a RealFloat constraint).

Hence, after all these words, if there is something practical I get to wish for, it is that someone who understands more maths than I do fixes up the instances so that Complex Int is not just a gluesticked mantelpiece.

If you’re still reading, thanks! and I hope you enjoyed it.

Manav Rathi
Jan 2024