Forget what you have seen - numbers on the number line, complex numbers on the xy graph.
Numbers are things. All that we know about these things are the equations that relate them to other things (numbers). Numbers don't actually lie on the number line, that is a convenient visualization but like all analogies it starts to be harmful when we insist on mistaking the analogy for the territory.
All that can be said about a number is the relations that describe it in terms of other numbers. Everything else is commentary.
A common pattern in these equations is to find an expression of the sort
This confused me for the longest time. Now I know what it means - it is a number on the unit circle! These numbers are common since they are a convenient way of describing a rotation, or a phase shift of a periodic process. Let us arrive at this conclusion piece by piece.
Firstly, what is ?
Consider the following equation:
There are reasons† why a sequence of operations as described by the right hand side arises commonly in nature's, and in our own, undertakings. So much so that we have given a name to this function - it is called the exponential function.
† e.g., it is the fixed point of derivation. That is, is the (only) function whose derivative is equal to itself. That is, the value of at any number is also equal to the rate of change of at .
What is the value of this function when we evaluate it at 1?
$ node
> factorial = (n) => n == 0 ? 1 : n * factorial(n - 1)
[Function: factorial]
> exp = (x) => Array(99).fill().reduce((s, _, i) => s + x ** i / factorial(i), 0)
[Function: exp]
> exp(1)
2.7182818284590455
That looks familiar - it is .
So is the value of the exponential function when it is evaluated at . Since , the constant, is so ubiquitous, it has somewhat come to overshadow the function it originates from, and sometimes it just stands in for the entire function itself. For example, people sometimes write when what they mean is .
While confusing, this is mathematically correct, because the exponential function obeys the following identity
This can be seen by plugging in the values into the definition of above and doing some symbolic algebra to convince ourselves. Assuming we're convinced, then it is easy to see that, for example,
So is equivalent to .
This equivalence seems irrelevant, since multiplying by itself times seems a good enough way to arrive at the same result, and also matches the notation for raising to power that we're already familiar with, then why remind ourselves that it is a shorthand for an underlying function evaluation?
Because it allows us to see what complex exponentiation means.
A complex number is a tragedy of nomenclature. They are neither complex, nor imaginary.
This is perhaps an opportune point to repeat the diatribe I started with. Number are just things defined by their relations, nothing more can be said about them. We however get misled by labels attached to some of them (e.g. complex number) or their visual analogies (showing complex numbers on a 2D plane) to think of them as something they're not.
So forget the fact that a complex number has two components, or that it is different from a real number. Just think of all of them, real or complex, as just things that we can relate to each other, and all the ways they relate to other numbers is their very definition, they have no inherent existence of their own.
One such relation is
The number that satisfies this relationship is called . It is a complex number; numbers that we're less familiar with intuitively (but I'm sure if we give it a few hundred years, children will find complex numbers as intuitive as we find real numbers today), but it is in no way unreal. It is as real as real numbers, and in fact more so, because complex numbers are closed while real numbers are incomplete: what we just found, the square root of minus 1, an eminently reasonable question, does not have its answer in real numbers, but no matter what you do to complex numbers, you still get back a complex number.
Figuratively and literally, complex numbers open another dimension to us. But when we start putting complex numbers in the relations we had previously used for with only real numbers, we face new questions. For example, what does it mean to raise to the power of a complex number? Or more specifically, since we've only seen one complex number so far, , what does mean?
The repeated multiplication doesn't work. We effortlessly think of as multiplied by itself times, , but how do we multiply by itself times?
We're asking the wrong question, and getting confused because we're confusing the notational shorthand for the real thing. is a shorthand for , so instead of wondering what is , what we actually want to know is what is .
This is a simple question with a easy to obtain answer. We can just plug in in the definition of
Hmm. All these symbols Manav, what do they mean?
Unless you're the sort of person who skips footnotes, you'd remember how we talked about being a fixed point of derivation. Let's putter around with that thought.
At the simplest, we can think of a function whose value never changes. The number of suns in the sky†. No matter at how long after birth (i.e., ) I'm trying to evaluate the number of suns in the sky (i.e., ), I still get back the same answer , which never changes (i.e., it has a rate of change , or ).
† It is hard to come up with physical examples that are strictly true. One can come up with exotic examples, say considering as the speed of light in a vaccum at some space-time coordinate , something that is generally taken as unchanging, but I have my doubts. I feel that the more we refine physics, the more we'll find these fundamental constants as also changing. The only sureshot examples of constant functions that I can think of are mathematical in nature, say defining as the ratio of the circumference of a circle to its radius.
As Heraclitus mentioned, perhaps the only constant is change, though I'm not sure how to formulate that as an equation. ?‡
‡ There is a mathematical convention (I think?) that I realized after an embarrasingly long time: constants are represented by , , etc, while variables are represented by , , etc.
Next up, we can think of a function whose rate of change is constant - that is, its rate of change does not depend on the input . Everyday my age increases by one day (i.e., , where is some constant, in this case ), independent of my age (i.e., ) or how long after birth I'm trying to evaluate my age (i.e., ).
These functions look like lines, which is why they are called linear. Let us mentally tag this group as functions whose derivative is a constant.
While I can't explain what the derivative is in a footnote, what I do want to re-emphasize is that the derivative is an operator - it takes a function and returns a function. This is different from regular functions, which take a number and return a new number. So the derivative of a function , , is another function, say . This sounds complicated when put into words, but those of you who have written code would recognize this sort of a meta-function as quite common in programming, and they're not all that complicated either.
Note that we can think of as just another constant, so our previous group of unchanging functions really is part of this current group of functions that we are considering. Indeed, constant functions are also lines, just horizontal ones, so both these groups are the same thing that way.
If we consider them as distinct groups, we can start building a tower of changeability. That sounds interesting, let's try that.
This allows us to find out the rate of change of the rate of change, i.e., the second derivative. We just go one step up in the tower. So if we start with a linear function , like my age, its derivative would be a constant function , and we can look up its derivative one row up in the tower to find out the second derivative of my age is .
Enough words, I think I'm belabouring the point. Let's move on with our sequence.
Next up, we can think of functions whose rate of change when evaluated for some number is in some way related to value of itself. These are functions of the form . For example, if I go to a meeting and before the meeting starts, each person present does a handshake with each other person present there, then the number of handshakes we'll end up doing, , is (almost) the square of the number of people . These numbers grow large very quickly - for a meeting of 7 people, there will be 49 handshakes - because for each new person added to the meeting, the number of handshakes that'll increase is (roughly) equal to the number of people now in the meeting.
We can continue this pattern. For example, we can think of functions whose rate of change when evaluated for some number is in some way related to . These turn out to be functions of the form . Putting these guys in our tower,
Does this tower ever reach a function whose the rate of change when evaluated for some value is in some way related to itself?
Yes! Consider
For this function, the rate of change is equal to the function itself. That is, .
What is this ? Why, it is the exponential function, , that we'd been talking about earlier. And this is what it means for it to be a fixed point - unlike the other functions we've seen so far, no matter how many times we take the derivative of , we get back the exact same function again. Put differently, it forms a (1-)cycle, like the ouroboros, the snake eating its own tail.
So a natural next question to ask would be - is a function that forms a 2-cycle? That is, if we take its derivative twice, we get back the same function again?
Somewhat surprisingly, there is! The pair and cycle back to each other after two steps.
There is a slight asymmetry, we get back the negative of where we started with, and I don't know what to make of it except perhaps that is the reason which makes this a 2-cycle instead of a 1-cycle.
This same 2-cycle works even if we start with the cosine function instead.
So these functions are like yin and yang, each engendering each other again, and again, ad infinitum. Philosophical glee aside, this behaviour is indeed quite curious, and one would imagine that there must be some internal similarity between these two functions and the exponential function, which forms a cycle by itself, for these two to form a cyclic pair. Or viewed from the other end, it is natural to wonder if we can somehow combine and to get ?
Let's look at the formula for the exponential function again
I gave you the formula, and someone gave it to me, but what if I told you we can derive it from first principles?
Alright, here goes. Let us try to come up with a polynomial to approximate without using the definition above. As a reminder, a polynomial is a function that looks like this:
That is, it is a sum of successive powers of the input to the function. Each power has a constant factor associated with it, to "scale" its contribution to the function. This constant can also be zero, in which case that particular power of will not be involved at all.
The highest power of with a non-zero constant associated with it is called the degree of the polynomial. Polynomials of degree 0 are constants, of degree 1 are lines, and of degree 2 are parabolas.
Since polynomials of degree too are quite common, they also have a nickname – they're called quadratic functions, or quadratic polynomials. Let's start by approximating using one of them. It will have the form:
We will make use of two facts:
Where do we start? Well, since , we can start by making our approximation also equal to 1 at 0. This lets us deduce the value of the constant .
Alright. Next up, we can try to imagine that around the input 0, if our approximation should have the same "shape" as , then it should have the same derivative as of at 0. Let us find the derivative of f.
And we can easily deduce that the derivative of at 0 is 1.
This lets us deduce the value of the constant , since we're seting its derivative at 0 to be equal to the derivative of at 0 to give our approximation the same slope.
Continuing with this shape chasing, we can set the second derivative of at 0 to be equal to the second derivative of at 0 so that so that "shape" of our approximation around 0 is the same as the shape of around 0. The second derivative of f is
And the second derivative of at 0 is 1.
And we can combine these two pieces of information to deduce the value of
So we have managed to deduce all three of the constants for our quadratic approximation, giving us
If you notice above, when we were deducing the constants for each term, the previous constants did not matter – they effectively get wiped out when we take the nth derivative. This means is that to extend our approximation by a degree, we just need to consider the next derivative to compute the constant for the latest degree; the previous approximation still remains valid otherwise.
So let us go one degree higher, and consider degree 3 polynomials, affectionately called cubic polynomials. That is, let us find an approximation to of the form
As I mentioned, the way we're deducing these constants means that the previous ones remain untouched, so we already know the values of , and ; we just need to deduce .
We continue our shape chasing, and set the third derivative of at 0 to be equal to the third derivative of at 0. The third derivative of f is
And the third derivative of at 0 is 1.
Combining these two pieces of information, we can deduce the value of
Giving us the cubic equation
Do you see where we're going? We're getting something that very closely resembles the formula for the exponential function!
This is not a coincidence. If we continue this process, increasing the degree of the polynomial one by one, we will get this same formula. It is an infinite degree polynomial, and these are called series†. Further, this turns out not to be an approximation, but is exactly equal to ; indeed, that's how we first defined .
† This particular one is called a Taylor series.
We should pause here for a minute to marvel at the magnificient vista we've reached. We were able to deduce the exact definition of a function just by the information of its derivatives at a single input. The derivatives of a single value contained the definition of the entire function!?
You might think that this is a special case: is the fixed point of the derivative operator, and maybe there is some hoogedly poogedly going on for that special case. Well, you're not entire wrong, but you're not entirely right either - isn't the only function for which we can use the above method to derive its definition just from the values of its derivatives at some input. There are other such functions†. Can you guess two of them?
and !
† You're partially right because this
If you were to repeat a process very similar to what we used above (no advanced mathematics needed - just knowledge of the derivatives of and , and their values at 0; indeed, the Indian mathematician Madhava was able to derive them 700 years ago, when calculus as we now know it hadn't even been formulated), you will be able to arrive at the following series representations of them.
Whoa. If you squint at these closely, and ignore the minus signs for a minute, you'd see how the series for and seem like two pieces of a jigsaw puzzle that combines to give us the series for .
Of course, we can't ignore the minus signs. But even before that, it is possible that you might be feeling lost as to what it is that we're trying to do here.
Let me rewind the tape. We were trying to find the meaning of the following equation.
This was the equation we'd obtained when we had plugged in the complex number into the definition of .
Since we didn't know how to make sense of it, I had taken you on a diversion, where we'd seen how smooth functions can be written in terms of their Taylor series, and in particular, we found out the Taylor series for and .
This diversion was useful, since I didn't want to make it appear as if I simply mandated the series for and ; we were able to get a sense of how they arise.
Either ways, now let us do some basic algebra with the series expansion of . The two things you need to keep in mind are:
Alright, let go.
Aha. So . That is, the value , which is the result of evaluating the exponential function at the complex number is a complex number whose "real" part (i.e. the x-component) is , and whose "imaginary" part (i.e., the y-component) is .
But that's not all. The above algebra works even if we multiply the input by any arbitrary number . That is,
This is known as Euler's formula, after Swiss mathematician Leonard Euler who noticed it 300 years ago. It makes precise our intuitive guess in the last section, where we thought that and were like pieces of a jigsaw puzzle that should fit together to form . Except, that this jigsaw puzzle cannot be assembled in the domain of so called real numbers, we need to level up and use the "really real" numbers - i.e., complex numbers - to see this truth.
The pudding is yet to come. So now that we know that the result of evaluting the exponential function at any multiple of the complex number will be a complex number with the rectangular form of , and recalling that the length (aka magnitude) of a complex number is the square root of the sum of the squares of its two rectangular form components, we can use basic arithmetic to deduce that the length of such numbers is always 1.
The last step follows from the fact that if you draw a right angle triangle from origin to any point on the circle of radius , then Pythagoras' 2500 year old discovery tells us that
Where and are, respectively, the projections of this point on the x- and y- axis. Dividing both sides by , and noticing that and are the definitions of the sine and cosine functions,
So we've deduced that the value of is a always a complex number of length 1. What does that mean? It means that these points are always on the unit circle!
So there you have it. The exponential function takes complex numbers that lie on a straight line, the so called imaginary axis (all and any multiples of , i.e. complex numbers of the form for any arbitrary value ) and maps them onto complex numbers that all lie on the unit circle (a circle is a 2D creature, and lives on the complex plane). Because of an abuse of notation, this is sometimes written as
but to avoid confusing ourselves, it is better to see that the above is a shorthand for the relationship
A circle epitomizes periodic motion, and so this relationship is ubiquitous anytime we're dealing with phenomena that exhibit periodic motion, as it allows us to map linear inputs (points on the imaginary axis) to a periodic one (points on the unit circle).
We also took a detour that showed us that two other functions that are commonly brought up as examples of periodic motion - and - are actually components of the quintessential periodic motion: movement around a circle.