Decoding Leibniz notation - Slava Akhmechet

I wrote this for myself to understand the Leibniz notation. Prerequisites for this post are the definition of the derivative and the Lagrange notation. If you don’t understand these yet, please study them first.

You may have already seen something like

\frac{d y}{d x}

. This is called the Leibniz notation. The Leibniz notation has many of what Spivak calls “vagaries”. It has multiple interpretations– formal and informal. The informal interpretation doesn’t map to modern mathematics, but can sometimes be useful (while at other times misleading). The full, unambiguous Leibniz notation is verbose, so in practice people end up taking liberties with it. As a consequence, its meaning must often be discerned from the context.

This flexibility makes the notation very useful in science and engineering, but also makes it difficult to learn. I explore it here to make learning easier.

Historical motivation

We start with the historical interpretation, where the notation began. Leibniz didn’t know about limits. He thought the derivative is the value of the quotient

when

h

is “infinitesimally small”. He denoted this infinitesimally small quantity of

h

dx

, and the corresponding difference

f(x+dx)-f(x)

df(x)

. Thus for a given function

f

the Leibniz notation for its derivative

f’

is:

Intuitively, we can think of

d

in a historical context as “delta” or “change”. Then we can interpret this notation as Leibniz did– a quotient of a tiny change in

f(x)

and a tiny change in

x

. But this explanation comes with two important disclaimers.

First,

d

is not a value. If it were a value, you could cancel out

d

’s in the numerator and the denominator. But you can’t. Instead think of

d

as an operator. When applied to

f(x)

x

, it produces an infinitesimally small quantity. Alternatively you can think of

df(x)

and

dx

as one symbol that happens to look like multiplication, but isn’t.¹

Second,

\frac{d f (x)}{d x}

denotes a function (the same one denoted by

f’

), not a value at a point (i.e. not

f’(a)

). To denote the image of the derivative function at

a

we use the following notation:

Writing all that is a pain and in practice people rarely do it this way, but we’ll get to that in a minute.

Modern interpretation

In modern mathematics real numbers do not have a notion of infinitesimally small quantities. Thus in a modern interpretation we treat

\frac{d f (x)}{d x}

as a symbol denoting

f’

, not as a quotient of numbers. Nothing here is being divided, nothing can be canceled out. In a modern interpretation

\frac{d f (x)}{d x}

is just one thing that happens to look like a quotient but isn’t, anymore than

f’

is a quotient.

Second derivative

A question arises for how to express the second (or nth) derivative in the Leibniz notation. Let

g(x)=

(i.e. let

g

be the first derivative of

f

). Then it follows that the second derivative in Leibniz notation is

=g’=f’′

. Substituting the definition of

g

we get:

Of course this is too verbose and no one wants to write it this way. This is where the vagaries begin. For convenience people use the usual algebraic rules to get a simpler notation, even though formally everything is one symbol and you can’t actually do algebra on it:

First, why

dx^2

? Shouldn’t it be

(dx)^2

? One way to answer this question is to remember that

dx

is one symbol, not a multiplication (because

d

is not a value). And so we’re just squaring that one symbol

dx

, which doesn’t require parentheses.

Another probably more honest way to answer this question is to recall that this isn’t real algebra– we just use a simularcum of algebra out of convenience. But convenience is a morally flexible thing, and people decided to drop parentheses because they’re a pain to write. So

(dx)^2

became

dx^2

Second, we said before that

df(x)

can be thought of as one symbol. Then what is this

d^2

business? The answer here is the same– we aren’t doing real algebra but a simularcum of algebra. We aren’t really squaring anything; we’re overloading exponentiation to mean “second derivative”. The symbol

d^2f(x)

is again one symbol.

Liberties and ambiguities

There are a few more liberties people take with the Leibniz notation. Let

f(x)=x^2

. If we want to denote the derivative of

f

we can do it in two ways:

Here

\frac{d x^{2}}{d x}

is new, but the meaning should be clear. We’re just replacing

f(x)

df(x)

with the definition of

f(x)

. This is a little confusing because in the particular case of

f(x)=x^2

, it’s visually similar to the notation for second derivative. There are no ambiguities here so far– it’s just a visual artifact of the notation we have to learn to ignore. But now the liberties come.

Suppose we wanted to state what the derivative of

f

at a point

a

is. In Lagrange notation we say

f’(a)=2a

. In Leibniz notation the proper way to say it is:

But this is obviously a pain, so people end up taking two liberties. First, everyone drops the vertical line that denotes the application at

a

. So in practice the form above becomes:

This shouldn’t “compile” because

=f’

. Thus this statement is equivalent to saying

f’=2x

, which should be a syntax error. But this is the notation most people use, and you have to get used to it.

Second, people decided that writing

\frac{d f (x)}{d x}

is too painful, and in practice everyone writes

\frac{d f}{d x}

. This also shouldn’t compile (it would be something like writing

_{x a}f

, which also is a syntax error). But again, it’s the notation most people use.

Even more liberties

You’d think that we already pushed the notation past all limits of propriety, but scientists and engineers manage to push it even further. Consider the following simple problem. A circle’s radius is growing at 1 inch per second. How quickly is the area of the circle growing? Let’s solve it with Lagrange’s notation first.

The area of a circle is

A=r^2

. We’re trying to understand change by using derivatives to analyze behavior of functions. Since

r

is changing, what we’ll be looking at is the function for the area of a circle

A(r)=r^2

. And since the radius is changing with time, we have another function for the radius at a particular time

r(t)

. The problem doesn’t tell us how

r

is defined, but it tells us its derivative is

r’(t)=1

. All we have to do now is take the derivative of

A

Now here’s the rub. In science and engineering most values are somehow related to other values, and nearly everything is related to time. Explicitly defining functions makes even simple relationships (like the one above) complicated to write down. So people dispense with denoting functions explicitly, and just treat these quantities as functions. In practice, the Leibniz notation for the equation above is something like this:

We’re not explicitly defining or mentioning functions anywhere, but immediately proceed with the understanding that the variables

A

and

r

are really functions.

As a matter of studying advice, I spent hours trying to understand exactly why anyone might want to do this and how the mechanics work, until I sat down to do a bunch of simple related rates problems, at which point abusing the notation in this way quickly became the most natural thing in the world. So if you’re stuck, go solve a bunch of simple problems and then come back here. Hopefully by then everything will make a lot more sense.

I read somewhere that in his notebooks Leibniz experimented with extending $d$ with a squiggle on top that went over $x$ to indicate that $d$ is not a value, but I haven’t been able to verify if that’s true.↩︎