In this note you'll learn where all the math behind quantum superposition and measurement actually comes from.
These intuitions are pretty rare to find: I've only ever seen them stated as an axiom or as the "Born Rule"!
In this note you'll learn where all the math behind quantum superposition and measurement actually comes from.
These intuitions are pretty rare to find: I've only ever seen them stated as an axiom or as the "Born Rule"!
Skip to new experiment - polarization filters if you already know the double slit experiment and basic quantum mechanics.
You probably already know the experiment ( * ).

The electron here can't be described using known physics ( * ). That's where quantum physics comes in.
It looks like the probability distribution of finding the electron at a location on the screen was made by a water wave going through the two slits. So we conclude that that's exactly what happened: a wave did go through the two slits.
Here's an example ( * ).
Here's what the evolution looks like * :

People say "the electron is at multiple positions at once", because the electron is literally a wave, and waves occupy many positions at once. Scientists say the electron is in a "superposition" over multiple positions.
There's something very special about the screen. The screen causes the electron to occupy a specific position. Scientists say the screen "measures" or "observes" the wave, and causes it to "collapse" * .
Right after the electron hits the screen, the wave looks like this * :

In the universe, there's light and there are things with mass.
It turns out that all massive particles (like the electron) behave like waves. The electron isn't special.
It turns out that light also behaves the same way. If you know a little physics, you can reason through this * .
1. The electron is a wave, occupying multiple positions at once, in a superposition of multiple positions.
2. The screen measures or observes the electron, aka it decides a specific position for the electron based on the wave's height and collapses the wave.
Say we perform the double slit experiment, but only with the right half of the screen ( * ).
What happens when the wave hits the screen? Well, obviously, the universe needs to decide whether the electron hit the screen or not! We can use similar reasoning as above * to determine what happens:

Of course, the electrons that hit the screen still make the pattern * .
If you measure the wave in a particular region, the wave either gets localized to a point in the region (based on the wave height there), or now has height of 0 in the region:

Unfortunately, the double slit experiment is the best way to see superposition and measurement, but not the best starting point for describing these things mathematically, because it has superposition and measurement over position, which is infinite.
On the other hand, the "light filter experiment" (explained here) has superposition and measurement over just two things, so it's much simpler. So we'll start with this light experiment.
We can use Maxwell's Equations to determine what light looks like * .
Light has an electric field ("E-field") and a magnetic field ("B-field"), and both are perpendicular to each other. Here's a simple case ( * ). Since the two fields oscillate together and are always perpendicular, if we know one field, we can fully determine know the other. So we only need to describe one of them. Let's just consider the E field * . In general, the E-field is given by:
This is just two perpendicular cosine waves added together! Both cosines travel together with the same speed and frequency (described by k, w, z, and t) * . We can change each cosine's individual size/"amplitude" a1 and a2, and its offset/"phase", ϕ1 and ϕ2.
( * ) We can get "linearly polarized" light by setting ϕ1=ϕ2.
( * ) We can get "circularly polarized" light by making the waves totally out of phase with the same amplitude (ϕ1±2π=ϕ2 and a1=a2).
Or we can get "elliptically polarized" light, which is anything in between linearly and circularly polarized.
Here's a picture of everything:
Your choice of coordinate systems is called a "basis". Obviously light behaves the same no matter what coordinates you pick. So it doesn't matter what basis you use.

Now that we know how light works, we can do an experiment with light particles to try and figure out how quantum mechanics works.
To do the experiment, you send a linearly polarized light particle through a "polarization filter". We observe that the light particle either passes through the filter and aligns in the same direction as the filter, or it gets blocked by the filter.

It turns out there's a probability that a photon makes it through the filter. Interestingly, it only depends on the angle θ between the filter and the E-field. * !
One more important idea: The filter only allows photons to pass through if they align with the filter. And half a photon isn't allowed to pass through * . So either 100% of the photon aligns with the filter and the photon passes through. Or 0% aligns and the photon gets blocked. If 0% aligns with the filter, the E-field still needs to be somewhere - specifically, it must be perpendicular to the filter.
In other words, the filter causes the photon to either:
(in case 2., the photon gets blocked).
This experiment is starting to look like the double slit experiment, where the electron became localized to a single position when it hit the screen. Here the photon is becoming localized to a single direction when it hits the filter!
We can reason that the photon's E-field is in a superposition over its two cosines. When the photon hits the filter, it gets measured and collapses either parallel (y^), or perpendicular (x^) to the filter.
Measurement is probabilistic. The filter needs to pick a probability for the incoming photon to align with x^, and a probability for it to align with y^. The only reasonable thing for the filter to do is to look at how much the incoming photon currently aligns with x^ and y^, and pick the probability based on that.
In other words, decompose the photon into a1x^+a2y^, and pick probability based on a1 and a2.


This forces the universe to square the amplitudes to get the probabilities * , and we can see that P[pass]=Prob2=(aa2)2=cos2θ!
Here's an analogy to the double slit experiment. The photon is a "wave" that collapses when it hits the filter, based on the height of the wave at each location! Same exact thing as the double slit experiment, just over 2 things, not infinitely many!

A photon's E-field is in a superposition over its two cosines. A filter measures the incoming photon, causing its E-field to either collapse into a cosine that aligns with the filter, or a cosine that aligns perpendicular.
P[pass]=P[photon is measured to align with the filter].
P[blocked]=P[photon is measured to align perpendicular].
To get a better idea of what the filter cares about, we can rewrite (1) as:
The exponent term just shifts where the E-field is in its cycle, which the filter doesn't care about. The only thing the filter cares about is the "shape" of the photon, given by a1eiϕ1x^+a2eiϕ2y^ * ! Note that the shape is usually called the "state" of the photon.
Let's notate the incoming photon's state as
and let's notate the state of the filter's pass-through photon as
Note that when the filter decomposed the light into y^ and x^ components, it was really projecting the incoming photon onto x^ and y^.
Putting this in terms of a and p, cos2θ is what we get when we project a onto p and square the result for linearly polarized light * .
This projection idea is how we'll extend our result from linearly polarized light to all polarized light.

First of all, the filter doesn't care about the magnitudes of either state. So we'll have to ignore them when we get the probability. It's common to notate the magnitude of a vector as a=∣a∣=a12+a22. Typically we choose to always ignore the magnitudes by setting a=1 and p=1 * .
Now, the goal is to figure out how exactly to perform the projection of a onto p.
The projection's phase is clearly irrelevant to computing probability, so we should ignore it * .
Projection is typically done with the dot product.
If we combine these two ideas, we get the initial guess of (projection of a onto p)=∣a⋅p∣.
This works for linearly polarized light to give cosθ, but it gives nonsensical results for arbitrary a and p * . But it's an easy fix: all we need to do is conjugate one of the vectors before taking the dot product * .
The projection of v1 onto v2 is defined as:
Defining probability as ∣p∗⋅a∣2 works for all cases of linearly polarized light and one case of circularly polarized light, and we can reason that it works in general * !
Now that we're dealing with complex vectors, we don't say "perpendicular", we say "orthogonal". Naturally, vectors v1 and v2 are orthogonal when the projection of one onto the other is 0:
Putting this all together,
This is a picture of measurement in general, for any incoming light described by a, and any filter that lets through light described by p (of course, p is orthogonal to p⊥). Note that the phase is ignored!:

example - lin going into lin filter *
(Note: * )
example - circ going into lin filter *
(Note: * )
example - circ going into circ filter *
example - lin going into circ filter *
example - anti circ going into circ filter *
The projection of a complex vector onto another one just requires conjugating one of the two vectors before we take their dot product * . This gives:
We reasoned that "orthogonal" for complex vectors should mean that the projection of one onto the other equals zero, i.e. v1 is orthogonal to v2⟺v1∗⋅v2=0.
It shouldn't come as a surprise that for the light polarization experiment, the wavefunction is defined as * :
The relevant quantity to measurement in the double slit experiment is * :
Even though in the double slit experiment positions are not physically orthogonal to each other, we describe measurement as if they are orthogonal * , just like in the photon polarization experiment! All the math is the same!
Consider the double slit with only 3 possible positions for the electron. Then, measuring the electron's position would look like this:

Clearly, the wavefunction for the double slit experiment with only 3 possible positions is this: a1eiϕ1x^1+a2eiϕ2x^2+a3eiϕ3x^3.
If we generalize this to 10 possible positions, the double slit's wavefunction is:
In the real experiment we have infinitely many positions, and the wavefunction is:
Each basis vector is orthogonal and normalized, as always * .
Below I notate complex numbers as a letter with an underscore, a, which makes things much cleaner. a=aeiϕ.
Here's how we normalize (3) and (4), respectively:
In the discrete case, what's the proabability that we measure a in the third position x^3? Well, if all the math is the same in both experiments, then it should be (a projected onto x^3)2.
Here's the probability for (3) and (4), respectively:
The dx makes the continuous case go to 0 as expected * .
We can get the probability that the electron is in a region by just summing up all the probabilities there.
Here's the probability in a region for (3) and (4), respectively:
Above, the screen only measured the position of the electron, so it measured in the basis of x^(x). But what if we have a measurement apparatus that measures, say b^?
I haven't given any other examples besides position, but I figured I'd put this in to be complete * .
Here's the probability for (3) and (4), respectively:
Here are the full details for the continuous case * .
Rather than using vectors and writing a∗⋅p, Paul Dirac invented bra-ket notation. It just changes our notation from vectors to matrices. When we start writing matrices and not just doing dot products, this notation becomes much easier to use than vectors and star.
It's called bra-ket notation because:
a=∣a⟩=a1a2a3..
a†=a⊺∗=⟨a∣=(a1∗a2∗a3∗...)
a∗⋅b=⟨a∣∣b⟩=⟨a∣b⟩=(a1∗a2∗a3∗...)b1b2b3..
If you're confused about a result, just look at the part that's not written in bra-ket notation and compare.
The wavefunction for light polarization was a=a1eiϕ1x^+a2eiϕ2y^. Now we can write it as ∣ψ⟩=a1∣↔⟩+a2∣↕⟩.
For the discrete case of the double slit experiment, the wavefunction was a=∑i=110aieiϕix^i. Now we can write it as ∣ψ⟩=∑i=110ai∣i⟩.
For the continuous case of the double slit experiment, the wavefunction was a=∫−∞∞a(x)eiϕ(x)x^(x)dx. Now we can write it as ∣ψ⟩=∫−∞∞a(x)∣x⟩dx.
All of these states ψ were normalized so that the sum of the squares of the ais was 1. Even the basis states are normalized. Typically, we normalize everything. A state ∣ψ⟩ is normalized if and only if
Two states ∣ϕ⟩ and ∣ψ⟩ are orthogonal if and only if
Measurement always takes place in an orthogonal basis. For the 3 examples above, that means ⟨↔∣↕⟩=0, ⟨i∣j⟩=0, and ⟨x∣x′⟩=0. And a measurement basis is always normalized, i.e. ⟨i∣i⟩=1.
Putting these two ideas together, a measurement basis ∣b⟩1∣b⟩2... always satisfies
And of course, the probability of measuring any state ∣ψ⟩ to be in the state ϕ is
A wavefunction is described by a sum or integral over basis vectors:
We assume every wavefunction is normalized (to be clear, this applies to basis vectors too):
We also assume all basis vectors are orthogonal, so that:
The probability of measuring ∣ψ⟩ in state ∣ϕ⟩ is:
eiϕ(a1∣↔⟩+a2∣↕⟩) gives the same exact probability as (a1∣↔⟩+a2∣↕⟩).
eiϕ is called "global phase" because it adds phase globally to the state. The "global phase" doesn't matter to the probability, because we take an absolute value.
A common question is if the global phase is unknowable, or if it's just irrelevant. The answer is that it's irrelevant. The global phase for the photon was the place in the E-field cycle, which we can certainly figure out. It just isn't relevant to probability.
From (2) you should easily be able to reason why 21(∣↔⟩+i∣↕⟩) gives circularly polarized light ∣↺⟩. The y^ or ∣↕⟩ component just lags 90° behind the x^ or ∣↔⟩ component!
This idea extends to all wavefunctions. In the double slit experiment, you can think of the phase of each basis state as the relative offset of the cosine wave at that position.
The next note goes over the more physics-y side of things: position, momentum, spin (Stern Gerlach experiment), and Schrodinger's equation, which tells you how the wave evolves without measurement, i.e. how it evolves until it reaches the screen. Schrodinger's equation just says that wavefunction evolves and interferes the way you'd expect - the wave has the same shape as the E field we saw here, and radiates spherically at all points that it occupies.
The interesting thing is that the wavefunction really is a full description of the particle. The wavefunction at one instant in time dictates how it will evolve in the future (assuming no measurement takes place).
There's also the less physics-y side of things: quantum computing. Stay tuned.
That completes the note, although we still have more to go: the Schrodinger Equations, the Stern Gerlach Experiment, entanglement, the Bloch Sphere, intuitions on quantum teleportation and quantum computing, and more.
Let me know if you liked this note, or if there were any places I should improve (just leave a comment!). If you got stuck somewhere, I urge you to leave a question/comment in that location.