\( \newcommand{\u}{\mathbf{u}} \newcommand{\v}{\mathbf{v}} \newcommand{\x}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\zero}{\mathbf{0}} \newcommand{\e}{\epsilon} \newcommand{\g}{\gamma} \newcommand{\b}{\beta} \newcommand{\NN}{\mathbb{N}} \newcommand{\QQ}{\mathbb{Q}} \newcommand{\RR}{\mathbb{R}} \newcommand{\ZZ}{\mathbb{Z}} \)
This page shows the development of the Lorentz Transformation from special relativity, with emphasis on a (moderately) rigorous mathematical development from Einstein's postulates.
An inertial frame of reference is a frame of reference in which Newton's Second Law holds; that is to say, particles at rest stay at rest and particles in motion stay in motion at constant velocity unless acted on by external forces. If a particular (right-handed, cartesian) coordinate system, at rest within the frame of reference, has been chosen then it will be called an inertial coordinate system.
Because affine transformations map straight lines to straight lines, it is clear that any affine change of coordinates of an intertial coordinate system \(S\) into a new coordinate system \(S'\) will also also yield an inertial coordinate system. In the next section we will see that the converse is true: the coordinates of inertial frames of reference are always related by a natural affine transformation.
We now state Einstein's two postulates of relativity:
The First Postulate of Relativity is that the laws of physics are identical in all inertial frames of reference.
The Second Postulate of Relativity is that the speed of light (\(c\)) is the same when measured in any inertial frame of reference.
The principle of Homogeneity of Spacetime states that the results of any experiment or measurement under identical conditions will be the same no matter where (or when) they are made. This is an immediate consequence of the First Postulate, because, if there were an experiment which produced different results at points \(\x_1\) and \(\x_2\) in the interial frame \(S\), then we would consider the new frame \(S'\) related to \(S\) by the affine map \(\x \mapsto \x + (\x_2 - \x_1)\), in which the event at \(\x_1\) in \(S\) now has the same coordinates in \(S'\) as \(x_2\) had in \(S\).and therefore should be indistinguishable from it. (Note here that the points, or "events" \(\x\) are are four vectors of the form \((t, x, y, z)\) and this argument addresses both homogeneity of time and of space.)
The principle of Isotropy of Spacetime states that the results of any measurement or experiment under identical conditions will be the same no matter what direction is chosen. As in the previous paragraph this follows from the First Postulate by considering a new frame of reference derived from the original one by a rotation (in the 3 spatial dimensions).
Note: For convenience when considering two (or more) inertial frames of reference we shall usually assume that they have a common origin (i.e., the both systems' clocks are synchronized at \(t=t'=0\) and at that point the origins of the two spatial coordinate systems are at the same point).
Let \(S\) and \(S'\) be two inertial frames of reference (with common origin points). Let events in \(S\) be coordinatised as \((t,x,y,z)\) and events in \(S'\) as \((t', x', y', z')\). Let \(f\) be the transformation which maps the coordinates \((t,x,y,z)\) of a physical event \(E\) as measured in \(S\) to the coordinates \((t', x', y', z')\) of the same event measured in \(S'\). In this section we shall show using the homogeneity of space time that the mapping \(f\) between the coordinates of two intertial reference frames \(S\) and \(S'\) must be linear.
Consider two events whose coordinates in \(S\) are \(\x\) and \(\x + \Delta\x\), so that their coordinates in \(S'\) are \(f(\x)\) and \(f(\x + \Delta\x)\). Imagine that \(\Delta\x\) stays constant, while \(\x\) varies and consider \(f(\x + \Delta\x) - f(\x)\) purely as a function of \(\x\). If that function varied at all, then we could use it to distinguish different points of spacetime, contrary to the homogeneity of spacetime. Thus, \(f(\x + \Delta\x) - f(\x)\) must depend only on \(\Delta\x\), which we can write as \[ f(\x + \Delta\x) - f(\x) = g(\Delta\x) \] for some function \(g\). Rewriting this with different names for the quanties, we see \[ f(\x) - f(\y) = g(\x - \y) \]
Lemma 1. The function \(g\) is additive.
Proof. Given \(\x\) and \(\y\) choose \(\x_1\), \(\x_2\), and \(\x_3\) such that \(\x = \x_3 - \x_2\) and \(\y = \x_2 - \x_1\) (e.g., \(\x_3 = \x\), \(\x_2 = \zero\) and \(\x_3 = -\y\)). Then \[ g(\x) = g(\x_3 - \x_2) = f(\x_3) - f(\x_2) \] and \[ g(\y) = g(\x_2 - \x_1) = f(\x_2) - f(\x_1) \] so that \[ \begin{align} g(\x) + g(\y) &= f(\x_3) - f(\x_2) + f(\x_2) - f(\x_1) \\ &= f(\x_3) - f(\x_1) \\ &= g(\x_3 - \x_1) \\ &= g(\x + \y) \end{align} \] and so \(g\) is additive.
Lemma 2. If \(g\) is continuous then it is linear.
Proof. Fix \(\x\), and let \(n\in\NN\). Then \[ \underbrace{g\left(\frac{1}{n}\x\right) + \cdots + g\left(\frac{1}{n}\x\right)}_{n \text{ times}} = g(\x) \] and so \(g(\frac{1}{n}\x) = \frac{1}{n}g(\x)\). Likewise, \(g(\frac{m}{n}\x) = mg(\frac{1}{n}\x) = \frac{m}{n}g(\x)\) for any \(m\in\ZZ\), and so \(g(q\x)=qg(\x)\) for all \(q\in\QQ\). By continuity, \(g(r\x) = rg(\x)\) for all \(r\in\RR\).
Corollary 3. Assuming the origins of \(S\) and \(S'\) coincide, then \(f\) is linear.
Proof. \(f(\x) = g(\x) + f(\zero) = g(\x)\), which is linear.
Note: There is a short proof of linearity using differentiation (see Rindler, p.11). The present proof has the advantage of assuming only that \(f\) is continuous (i.e., no sudden "teleportation" as one moves) rather than also requiring us to assume differentiability.
Note: Lemmas 1 and 2 together are known as Cauchy's functional equation.
Let \(S\) be an inertial frame of reference, and let \(E_i\) (\(i=1,2\)) be two events observed in \(S\). Say that these two events are "light-separated" if it is possible for a beam of light emited at \(E_1\) to arrive at \(E_2\). Writing the spacetime coordinates of \(E_i\) as \((t_i, x_i, y_i, z_i),\) it follows that the spatial distance between the two events is \[ \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2 + (z_2 - z_1)^2 } \] and, since light travels this distance in the time interval \((t_2 - t_1)\), this is also equal to \(c(t_2 - t_1)\). It follows that \[ c^2(t_2 - t_1)^2 - (x_2 - x_1)^2 - (y_2 - y_1)^2 - (z_2 - z_1)^2 = 0 \] and, by the Second Postulate, this must be true in the coordinate systems of any intertial frame of reference observing the same two light-separated events.
We now show that the value of the expression \[ c^2(t_2 - t_1^2) - (x_2 - x_1)^2 - (y_2 - y_1)^2 - (z_2 - z_1)^2, \] which known as the spacetime interval, is the same in any inertial frame of reference, when observing any two events (not necessarily light-separated).
To this end, let \(S\) and \(S'\) be two inertial frames of reference with fixed coordinate systems and common origins. We regard the spacetime events within the two coordinate systems as points in \(\RR^4\). Recalling that the map which maps the coordinates of events in \(S\) to the coordinates of the corresponding events in \(S'\) is linear, we write \(T\) for the \(4\times 4\) matrix of this transformation.
Now write \(Z\) for the matrix \[ \begin{bmatrix} 1 && 0 && 0 && 0 \\ 0 && -1 && 0 && 0 \\ 0 && 0 && -1 && 0 \\ 0 && 0 && 0 && -1 \\ \end{bmatrix} \] Then the events \((0,0,0,0)^t\) and \(\u = (t,x,y,z)^t\) in \(S\) are light-separated if and only if \(\u^t Z \u = 0\). Since the same is true for the events \((0,0,0,0)^t\) and \(\u' = (t',x',y',z')^t\) in \(S'\), it follows that \(\u^t T^t Z T\u = 0\) if (and only if) \(\u^t Z \u = 0\). We shall now prove by linear algebra that there must be a constant \(C\) such that \[ \u^t T^t Z T\u = C \u^t Z \u \] for all \(\u\in\RR^4\).
Definition 4. Let \(f\) be a symmetric bilinear form on the real vector space \(V\). Then \(f\) is said to be indefinite if there exist vectors \(u\in V\) such that \(f(u, u) > 0\) and also exist vectors \(v\in V\) such that \(f(v, v) < 0\).
Lemma 5. Let \(V\) be a finite-dimensional real vector space and let \(f\) be an indefinite symmetric bilinear form on \(V\). Suppose that \(g\) is a symmetric bilinear form on \(V\) and that \(g(u, u) = 0\) whenever \(f(u, u) = 0\). Then there is a constant \(C\) such that \(g = Cf\).
Proof. Fix a basis of \(V\) and let \(T\) be the symmetric matrix representing \(f\) with respect to this basis. Let \(V^+\) be the span of the eigenvectors of its positive eigenvalues and let \(V^-\) and \(V^0\) likewise be the spans of the negative and zero eigenvectors of \(T\). Clearly from the theory of symmetric matrices \(V = V^+ \oplus V^- \oplus V^0\), and \(f(u, v) = 0\) for any \(u\) and \(v\) in distinct choices of \(V^+\), \(V^-\) and \(V^0\). (This just assembles the parts we need of Sylvester's law of inertia).
For convenience write \(f'\) and \(g'\) for the associated quadratic forms. In other words, \(f'(u) = f(u, u)\) and \(g'(u) = g(u, u)\).
Now we show that also \(g(u, v) = 0\) for any \(u\) and \(v\) in distinct choices of \(V^+\), \(V^-\) and \(V^0\). First, let \(u^+\) and \(u^-\) be nonzero vectors in \(V^+\) and \(V^-\) respectively. Then \(f'(u^+) > 0\) and \(f'(u^-) < 0\) and so take a suitable \(\lambda \not= 0\) such that \[ f'(u^+ + \lambda u^-) = f'(u^+) + 2\lambda f(u^+, u^-) + \lambda^2 f'(u^-) = f'(u^+) + \lambda^2 f'(u^-) = 0 \] A similar calculation shows that for the same \(\lambda\), \(f'(u^+ - \lambda u^-) = 0\). It follows by assumption that \(g'\) is also zero on these two vectors. Thus, by the polarization identity \[ 4\lambda g(u^+, u^-) = g'(u^+ + \lambda u^-) - g'(u^+ - \lambda u^-) = 0. \] Next, let \(u^+\in V^+\) and \(u^0\in V^0\). Choose a suitably scaled \(u^-\in V^-\) so that \(f'(u^-) = -f(u^+)\). Then \[ \begin{align} f'(u^+ + u^- + u^0) &= f'(u^+) + f'(u^-) + f'(u^0) + 2f(u^+, u^-) + 2f(u^+, u^0) + 2f(u^-, u^0) \\ &= f'(u^+) + f'(u^-) \\ &= 0 \end{align} \] Moreover, the same readily holds with the \(u^+\) replaced by \(-u^+\), and, by assumption, both of these identities also hold with \(f'\) replaced by \(g'\). But thus, \[ 0 = g'(u^+ + u^- + u^0) - g'(-u^+ + u^- + u^0) = 4g(u^+, u^-) + 4 g(u^+, u^0) \] so that \(g(u^+, u^0) = -g(u^+, u^-) = 0\), by what was seen previously. By exchanging the roles of \(u^+\) and \(u^-\) in the last calculation it also follows that \(g(u^-, u^0) = 0\) for any \(u^-\in V^-\) and \(u^0 \in V^0\).
Next, suppose that \(u^+, v^+\in V^+\) and \(f'(u^+) = f'(v^+)\). We shall see this implies \(g'(u^+) = g'(v^+)\). For simply choose a suitably scaled \(w^-\) in \(V^-\) so that \(-f'(w^-)\) is equal to the common value of \(f'(u^+)\) and \(f'(v^+)\). Then \(f'(u^+ + w^-) = f'(v^+ + w^-) = 0\) and so the same is true for \(g'\). But thus (using the results of the previous paragraph) \(0 = g'(u^+ + w^-) = g'(u^+) + g'(w^-)\) and so \(g'(u^+) = -g'(w^-)\). Since by the same token, \(g'(v^+) = -g'(w^-)\), the result follows.
Let \(u^+\) and \(v^+\) be arbitrary non-zero vectors in \(V^+\). Clearly there are some numbers \(C, C'\) such that \[ g'(u^+) = Cf'(u^+) \text{ and } g'(v^+) = C'f'(v^+) \] and of course this holds too for scalar multiples of \(u^+\) and \(v^+\). Scaling \(u^+\) and \(v^+\) appropriately, we can arrange \(f'(u^+) = f'(v^+)\) from which it follows \(g'(u^+) = g'(v^+)\). Since \(f'(u^+)\) and \(f'(v^+)\) are non-zero, \(C\) and \(C'\) must be equal. Thus there is a single \(C\) such that \(g'(u^+) = Cf'(u^+)\) for all \(u^+\in V^+\). A similar argument ensures that there is a \(D\) such that \(g'(u^-) = Df'(u^-)\) for all \(u^-\in V^-\). Moreover, choosing \(u^+\in V^+\) and \(u^-\in V^-\) such that \(f'(u^+) = 1\) and \(f'(u^-) = -1\), it follows that \(f'(u^+ + u^-) = 0\), so \(g'(u^+ + u^-) = 0\) and \[ 0 = g'(u^+ + u^-) = g'(u^+) + g'(u^-) = Cf'(u^+) + Df'(u^-) = C - D \] It follows that \(g'(u) = Cf'(u)\) for \(u\) in \(V^+\), or in \(V^-\), or, of course in \(V_0\). But from the orthogonality results above, \[ f'(u^+ + u^- + u^0) = f'(u^+) + f'(u^-) + f'(u^0) \] and \[ g'(u^+ + u^- + u^0) = g'(u^+) + g'(u^-) + g'(u^0) \] so that \(g'(u) = Cf'(u)\) for any \(u\in V\). That \(g = Cf\) follows immediately from polarization.
Now, returning to consideration of events in spacetime, recall that we record events in the interial frame \(S\) as \((t, x, y, z)\) and that the corresponding events \((t', x,', y,', z')\) in the inertial frame \(S'\) are related by a linear map with matrix \(T\), so that the coordinates of a single event in the two frames satisfy \[ \begin{bmatrix} t' \\ x' \\ y' \\ z' \\ \end{bmatrix} = T \begin{bmatrix} t \\ x \\ y \\ z \\ \end{bmatrix}. \] We also introduced the matrix \(Z\) so that, by the argument at the start of this section, \(\u^t T^t Z T\u = 0\) if (and only if) \(\u^t Z \u = 0\). Since the bilinear form \((\u,\v) \mapsto \v^t Z \u\) is clearly indefinite, it follows from Lemma 5 that there is a constant \(C\) such that \[ \v^t T^tZT \u = C \v^t Z \u \] for any events \(\u\) and \(\v\) (without any assumptions about light-separation). We shall call the number \(C\) in this equation the linking coefficient between \(S\) and \(S'\). Our goal now is to show that \(C = 1\).
Now let \(S_1\), \(S_2\) and \(S_3\) be any three inertial coordinate systems with coordinates \(\u^{(i)} = (t^{(i)}, x^{(i)}, y^{(i)}, z^{(i)})\) and matrices \(T_{ji}\) which transform coordinates of events \(\u^{(i)}\) in \(S_i\) to the corresponding coordinates \(\u^{(j)}\) in \(S_j\). By the First Postulate, \(T_{ij}T_{jk} = T_{ik}\).
From the above, there are linking coefficients \(C_{ij}\) such that \[ \u^t T_{ij}^tZT_{ij} \u = C_{ij} \u^t Z \u \] and so \[ C_{ik} \u^t Z \u = \u^t T_{ik}^tZT_{ik} \u = \u^t T_{jk}^tT_{ij}^tZT_{ij}T_{jk} \u = C_{ij} \u^t T_{jk}^tZT_{jk} \u = C_{ij}C_{jk} \u^t Z\u \] and so \(C_{ik} = C_{ij}C_{jk}\).
Now, the value of the linking coefficient in the equation relating \(S\) and \(S'\) is ostensibly determined by the choice of coordinates within \(S\) and \(S'\) as well as the velocity vector of the origin of \(S'\) with respect to the origin of \(S\). However it does not in fact depend on the orientation of the spatial coordinates in \(S\), for take \(S_1\) and \(S_2\) to be two coordinate systems in \(S\), at rest with respect to each other and having a common origin, but with possibly different orientation of their spatial coordinates, and let \(S_3\) be the original coordinate system for \(S'\). Clearly the transformation \(T_{21}\) is just a block matrix \[ \begin{bmatrix} 1 && 0 \\ 0 && U \end{bmatrix} \] where \(U\) is a \(3\times 3\) orthogonal matrix, from which it is clear that \(C_{21}=1\). But thus \(C_{31} = C_{32}C_{21}= C_{32}\) and so the change of (spatial) coordinate system in \(S\) makes no difference. Similarly a change of coordinate system in \(S'\) does not affect the linking coefficient. But therefore, too, the direction of the velocity vector of \(S'\) with respect to \(S\) cannot affect the linking coefficient, because if the direction changed, and at the same time the coordinate systems within \(S\) and \(S'\) changed orientation in the same way, then the overall sytem would be unchanged except in direction, and by the isotropy of spacetime this cannot affect the outcome of any measurement. Thus for any two inertial cordinate systems, the linking coefficient only depends on the magnitude of the velocity of \(S'\) with respect to \(S\).
Finally, again consider three inertial coordinate systems, \(S_i\) (\(i=1,2,3\)) and the relationship \(C_{23} = C_{21}C_{13}\) between their linking coefficients. We have seen that the coefficient \(C_{23}\) depends only on the magnitude of the velocity between \(S_2\) and \(S_3\). Consider the case where \(S_2\) and \(S_3\) are both moving in the same direction and velocity with respect to \(S_1\). Then \(C_{21}C_{13}=C_{23} = 1\). Consider now the case where \(S_2\) and \(S_3\) are both moving with the same absolute speed with respect to \(S_1\) as before, but now in opposite directions. The linking coefficients \(C_{21}\) and \(C_{13}\) are unchanged, and therefore their product, \(C_{23}\), is still 1. Because no assumptions were made about the velocity vector between \(S_2\) and \(S_3\) this conclusion holds in general for any two inerial coordinate syatems.
Thus we have shown that for any two inertial frames \(S\), \(S'\) and an event having coordinates \((t,x,y,z)\) in \(S\) and \((t',x',y',z')\) in \(S'\), \[ c^2t^2 - x^2 - y^2 - z^2 = c^2t'^2 - x'^2 - y'^2 - z'^2 \] Moreover, by rebasing our reference frames to have a different (common) origins, \[ c^2\Delta t^2 - \Delta x^2 - \Delta y^2 - \Delta z^2 = c^2\Delta t'^2 - \Delta x'^2 - \Delta y'^2 - \Delta z'^2 \] for the difference between any two events in \(S\) and \(S'\).
(This section draws inspiration from Elton)
Let \(S\) and \(S'\) be inertial reference frames. Specify the origins in spacetime of the two frames to coincide on a particular event. (So, at time \(t=t'=0\) the three spatial coordinates of both frames will be zero.) In this section we shall show that by suitably modifying the coordinate systems we can simplify the relationships between the to system to assume that the \(x\)-axes coincide, the frame \(S'\) moves in the positive \(x\) direction, and the \(y\) and \(z\) axes are aligned.
Take the \(x\)-axis of \(S\) to be in the direction of motion of the origin of \(S'\). Since the transformation of coordinates between \(S\) and \(S'\) is linear, this line corresponds to a striaght line in \(S'\) and by the First Postulate observers in \(S\) and \(S'\) both agree that the origin of \(S'\) proceeds along this line. Take this to be the \(x'\)-axis of \(S'\).
Now pick arbitrary lines in the \(S\) frame, orthogonal to each other and to the \(x\)-axis, and label these as \(y\) and \(z\)-axes to form a right-handed system. Similarly, pick \(y'\) and \(z'\) axes in \(S'\) to form a right-handed system. By linearity, it follows that \[ \begin{align} y' &= a_0 t + a_1 x + a_2 y + a_3 z \\ z' &= b_0 t + b_1 x + b_2 y + b_3 z \\ \end{align} \] and since events on the \(x\)-axis (i.e., those with \(y=z=0\)) always correspond to events on the \(x'\)-axis (i.e., with \(y'=z'=0\)), then \[ \begin{align} 0 &= a_0 t + a_1 x \\ 0 &= b_0 t + b_1 x \\ \end{align} \] for all \(t\) and \(x\). Thus \(a_0=a_1=b_0=b_1=0\) and \[ \begin{align} y' &= a_2 y + a_3 z \\ z' &= b_2 y + b_3 z \\ \end{align} \]
Now imagine an observer at the origin of \(S'\), looking back at the \(x'=0\) plane and imagine that a circle of radius 1 has been drawn around the origin in the \(x=0\) plane of \(S\). From isotropy of spacetime and the fact that the \(x'=0\) plane is symmetric in rotation about the \(x'\) axis, it follows that the observer in \(S'\) will also see a circle. But thus the \(2\times 2\) matrix \[ A = \begin{bmatrix} a_2 && a_3 \\ b_2 && b_3 \end{bmatrix} \] transforms circles into circles. By polar decomposition \(A = UP\) where \(U\) is an orthogonal matrix and \(P\) is a positive semi-definite symmetric matrix. Now \(P\) has two non-negative eigenvalues and if they were distinct then \(P\), and hence \(A\), would map circles to ellipses with non-zero eccentricity. Thus \(P = cI\) (\(c > 0\)) and \(A = cU\) is a multiple of an orthogonal matrix. But, reversing the transformation, an observer at the origin of \(S\) looking at a circle drawn in the \(x'=0\) plane would see the circle dilated by a factor of \(1/c\) and yet, by isotropy of spacetime, they should also see it dilated by a factor of \(c\). Thus \(c = 1/c\) and \(c=1\). It follows that the transformation \(A\) is an orthogonal transformation.
In principle, for a general orthogonal transformation, \(\mathop{det}(A) = \pm 1\), but we can rule out the case \(\mathop{det}(A) = -1\) with a continuity argument. When the velocity of \(S'\) with respect to \(S\) is zero, then, physically, the transformation between the two right-handed coordinate systems of \(S\) and \(S'\) is a rotation and so in this case \(\mathop{det}(A) = 1\). If we imagine slowly changing increasing the relative velocity in tiny increments, we expect the determinant to vary continuously and so a sudden jump from \(+1\) to \(-1\) is not possible. Thus \(A\) is a rotation in all cases.
It follows that by making a rotation of the \(y'-z'\) axes about the \(x'\) axis we can assume that:
Now also we know that for some coefficients (different from the \(a_i\)'s and \(b_i\)'s used above) \[ \begin{align} t' &= a_0 t + a_1 x + a_2 y + a_3 z \\ x' &= b_0 t + b_1 x + b_2 y + b_3 z \\ \end{align} \] and we also know \[ c^2t^2 - x^2 - y^2 - z^2 = c^2t'^2 - x'^2 - y'^2 - z'^2. \] But since we have arranged by choice of coordinates that \(y = y'\) and \(z=z'\), it follows that \[ c^2t^2 - x^2 = c^2t'^2 - x'^2 \] Now, differentiating with respect to \(y\), \[ 0 = 2c^2t' a_2 - 2x' b2, \] Since this holds for all \(x'\) and \(t'\) it follows that \(a_2 = b_2 = 0\). But a similar agrument with \(z\), \(a_3 = b_3 = 0\) as well. Thus \[ \begin{align} t' &= a_0 t + a_1 x \\ x' &= b_0 t + b_1 x \\ y' &= y \\ z' &= z \\ \end{align} \]
In the next section we will analyze this transformation restricted to \(x, x'\) and \(t, t'\) and calculate the values of the remaining coefficients.
To complete the computation of the Lorentz Transformation, start by assuming that coordinate systems for \(S\) and \(S'\) have been chosen as in the previous section. Since in this case the only non-trivial relationships are between \(x, t\) and \(x', t'\), wse write \(\tau = ct\) and \(\tau'=ct'\), and consider only the linear transformation from \([x, \tau]\) to \([x', \tau']\). Also, conventionally, write \(\beta = v/c\) and \(\gamma = 1/\sqrt{1 - \beta^2}\).
Suppose that \[ \begin{bmatrix}x' \\ \tau'\end{bmatrix} = \begin{bmatrix} a_{11} && a_{12} \\ a_{21} && a_{22} \\ \end{bmatrix} \begin{bmatrix}x \\ \tau\end{bmatrix} \] or, more concisely \(\x' = A \x\). Then, writing \[ Z = \begin{bmatrix} 1 && 0 \\ 0 && -1\end{bmatrix}, \] by invariance of the space-time interval, \[ \x^tA^tZA\x = (\x')^tZ\mathbf{x'} = x'^2 - \tau'^2 = x^2 - \tau^2 = \x^tZ\x \] By the Polarization Identity, \(\x^tA^tZA\mathbf{y} = \x^tZ\mathbf{y}\) for any \( \mathbf{x}\), \(\mathbf{y}\), and so \[ A^tZA = Z \] From this it immediately follows that \((\mathop{det}A)^2 = 1\) and so we write \(\mathop{det}A = \e\) where \(\e = \pm 1\). Also, by rearranging, \(ZAZ = (A^t)^{-1}\) and so \[ \begin{bmatrix}a_{11} && -a_{12} \\ -a_{21} && a_{22}\end{bmatrix} = \e \begin{bmatrix}a_{22} && -a_{21} \\ -a_{12} && a_{11}\end{bmatrix} \] Hence, \(a_{22} = \e a_{11}\) and \(a_{21} = \e a_{12}\).
Now consider the origin in \(S'\) and observe that at time \(t\) the \(x\)-coordinate of the origin of \(S'\) in \(S\) is \(vt\). This corresponds to the vector \([\frac{v}{c}\tau, \tau]\) and so the Lorentz transformation for the correspoinding \(x'\) coordinate (which is of course zero) gives \[ 0 = a_{11}\frac{v}{c}\tau + a_{12}\tau = (a_{11}\b + a_{12})\tau \] and so \(a_{12} = -\b a_{11}\). From this \[ A = \begin{bmatrix} a_{11} && -\b a_{11} \\ -\e \b a_{11} && \e a_{11}\end{bmatrix} = a_{11} \begin{bmatrix} 1 && -\b \\ -\e \b && \e \end{bmatrix} \] Taking determinants again, \[ a^2(1 - \b^2)\e = \e \] and so \[ a_{11}^2 = \frac{1}{1 - \b^2} = \g^2 \] Finally, choosing \(\e_1, \e_2 = \pm1\) such that \(a_{11} = \e_1\g\) and \(\e_2 = \e\e_1\), \[ A = \g \begin{bmatrix} \e_1 && 0 \\ 0 && \e_2 \end{bmatrix} \begin{bmatrix} 1 && -\b \\ -\b && 1 \end{bmatrix} \]
Now, a physical argument by continuity of \(\mathop{det}(A)\), similar to the one employed above shows that, since \(\mathop{det}(A) = 1\) when \(v=0\), it cannot suddenly jump to \(-1\) as \(v\) increases by small steps. Thus \(\e = 1\) and so \(\e_1 = \e_2\). Moreover, \[ \tau' = \e_2\gamma(\tau - \b x) \] Physically we expect time moves forward in both frames and so also \(\e_2 =1\). Thus \[ \begin{align} x' &= \gamma(x - \b\tau) \\ \tau' &= \gamma(\tau - \b x) \\ \end{align} \] Rephrasing in terms of \(t, t'\), \[ \begin{align} x' &= \gamma(x - vt) \\ ct' &= \gamma(ct - vx/c) \\ \end{align} \] or \[ \begin{align} x' &= \gamma(x - vt) \\ t' &= \gamma(t - vx/c^2) \\ \end{align} \] In summary, the full Lorentz Transformation is \[ \begin{align} t' &= \gamma(t - vx/c^2) \\ x' &= \gamma(x - vt) \\ y' &= y \\ z' &= z \\ \end{align} \]