Mathjestic: 2014

Monday, November 17, 2014

Function Whose Second Derivative is the Square of it's First Dervative

I was working on different forms of the Laplacian of a general product of scalar functions (which I might share in a later post). And I came to a point that I wanted to ask myself when is $$\nabla{f}\cdot\nabla{f}=\Delta{f}$$ which is similar to asking if $$\frac{\partial^2 f}{\partial x^2} = \left(\frac{\partial f}{\partial x}\right)^2$$ In fact, the latter is a special case of the former. And if the partials with respect to $x,y,z$ all follow this, then we have a term-wise equality between the dot product of the gradient and the Laplacian. But when does this happen? Well, let's start with a function of 1 variable. So we have the ODE $$f''(x)- \left(f'(x)\right)^2=0$$ First, substitute $g(x) = f'(x)$. $$g'-g^2=0$$ Clearly we can write the following $$\begin{align*} g&=g\\ g'&=g^2\\ g''=2gg'&=2g^3\\ g^{(3)}=6g^2g'&=6g^4\\ g^{(4)}=24g^3g'&=24g^5\\ &\vdots \end{align*}$$ And in general is appears that $$g^{(n)}=n!g^{n+1}$$ We can prove that with induction. The base cases are done. And the induction step is fairly straight forward $$g^{(n+1)} = \left(g^{(n)}\right)' = \left(n!g^{n+1}\right)' = \left(n+1\right)n!g^ng' = \left(n+1\right)!g^{n+2}$$ Let's try to write the Taylor series for $g$ now letting $g_0=g(x_0)$. $$g(x) = \sum_{n=0}^\infty\frac{g^{(n)}(0)}{n!}\left(x-x_0\right)^n = \sum_{n=0}^\infty\left(g_0\right)^{n+1}\left(x-x_0\right)^n=g_0\sum_{n=0}^\infty\left(g_0\left(x-x_0\right)\right)^n$$ This last part looks an awful lot like the geometric series $(1-x)^{-1}$. But we shift $x$ by $x_0$ scale the value by $g_0$. So really we have $$g(x) = \frac{g_0}{1-g_0\left(x-x_0\right)}=\frac{-a}{ax+b}$$ where $a=-g_0$ and $b=1+g_0x_0$. Now since we said that $g=f'$ we just integrate once to find $f$ $$f(x) = \int\frac{-a}{ax+b}\,\mathrm dx$$ Simple $u$-substitution for $u=ax+b$ yields $$f(x) = c - \ln(ax+b)$$ Double checking reveals that this does, in fact, solve our ODE. We could also separate variables and integrate to arrive at this same answer. Also, a buddy of mine shared the intuition almost immediately that $\ln(x)$ is a basic function that gets us close to the solution and putting a negative out front fixes the one sign issue that shows up. So we can just guess that our function looks like $$f(x) = -\ln(g(x))$$ for some function $g(x)$. Putting this into our original equation yields $$\frac{-g''(x)g(x)+g'(x)^2}{g(x)^2}=\frac{g'(x)^2}{g(x)^2}$$ So either $g(x) = 0$ or $g''(x)=0$. The former means we'd be dividing by $0$ (plus it's a boring answer), but the latter just implies that our function is linear. So $g(x)=ax+b$. Lastly, we just note that we only restrict the derivative, so we can slap a constant out front. $$f(x) = c - \ln(ax+b)$$ The same thing we got before! But, how can we come up with a special case where for $x,y,z$ we have $\nabla{f}\cdot\nabla{f}=\Delta{f}$? Well, a really simple solution would be $$f(x,y,z) = e - \ln(ax+by+cz+d)$$ with $a,b,c,d,e$ constants. The $by+cz+d$ group would be a "constant" when taking the partial derivative with respect to $x$. And the rest of the partials would follow similarly!

Now, I don't know if this is the only solution to our original equation because we worked from the single variable case up. But it's a simple solution.

Tuesday, September 16, 2014

Every square matrix is the sum of a unique symmetric and a unique skew-symmetric matrix

This one is going to be alone brief post. Came across this interesting property in my advanced solids course, as well.

Consider the square matrix $\mathbf{A}$. Clearly we can write $\mathbf{A}$ like this $$\begin{align*} \mathbf{A}&=\frac{1}{2}\left(\mathbf{A}+\mathbf{A}+\mathbf{A}^T-\mathbf{A}^T\right)\\ &=\frac{1}{2}\left(\mathbf{A}+\mathbf{A}^T\right)+\frac{1}{2}\left(\mathbf{A}-\mathbf{A}^T\right) \end{align*}$$ Let $\mathbf{B}=\frac{1}{2}(\mathbf{A}+\mathbf{A}^T)$ and $\mathbf{C}=\frac{1}{2}(\mathbf{A}-\mathbf{A}^T)$. So $\mathbf{A}=\mathbf{B}+\mathbf{C}$. Based on the property that $(\mathbf{A}+\mathbf{B})^T=\mathbf{A}^T+\mathbf{B}^T$ it's easy to verify that $\mathbf{B}$ is symmetric $(\mathbf{B}=\mathbf{B}^T)$ and that $\mathbf{C}$ is skew-symmetric $(\mathbf{C}=-\mathbf{C}^T)$. Therefore a solution exists.

But, is this solution unique? Assume we have two solutions: $\mathbf{A}=\mathbf{B}_1+\mathbf{C}_1=\mathbf{B}_2+\mathbf{C}_2$ where $\mathbf{B}_i$ is symmetric and $\mathbf{C}_i$ is skew-symmetric. Since $\mathbf{B}_1+\mathbf{C}_1=\mathbf{B}_2+\mathbf{C}_2$ we know that $$\mathbf{B}_1-\mathbf{B}_2=\mathbf{C}_2-\mathbf{C}_1$$ It's easy to show that the sum of two symmetric matrices is symmetric and the sum of two skew-symmetric matrices is skew-symmetric. Therefore, the left hand side is symmetric and the right hand size is skew-symmetric. Because they are equal, it implies that they are both the zero matrix. $$\mathbf{B}_1-\mathbf{B}_2=\mathbf{C}_2-\mathbf{C}_1=\mathbf{0}$$ And lastly, we see that $$\mathbf{B}_1=\mathbf{B}_2$$ $$\mathbf{C}_1=\mathbf{C}_2$$ And really, we were looking at the same representation. Therefore, it is unique!

In the course we were really talking about how the displacement gradient $\left(\nabla\mathbf{u}\right)$ is really the sum of the infinitesimal strain tensor $\left(\mathbf{\varepsilon}\right)$ that is symmetric and the infinitesimal rotation tensor $\left(\mathbf{\omega}\right)$ that is skew-symmetric. But the same principle applies to the much easier concept of matrices.

Wednesday, September 10, 2014

Temperature Distribution in a Semi-Infinite Wall

Any introductory heat transfer class solves for temperature distributions in plates, solids, etc.. The temperature distribution is a solution to the heat diffusion partial differential equation. The general form of the heat diffusion equation with no internal heat generation in Cartesian coordinates is $$\frac{\partial^2 T}{\partial x^2} + \frac{\partial^2 T}{\partial y^2} + \frac{\partial^2 T}{\partial z^2} = \frac{1}{\alpha}\frac{\partial T}{\partial t}$$ The first series of problems typically tackled in the intro classes are the steady-state solutions. These are the cases where $\partial T / \partial t = 0$. And the very first problems even restrict us to only one dimension. Slowly we add in dimensions and time dependence. However, one problem that I find particularly interesting is the semi-infinite wall. The problem involves a half-space that represents a wall. A very thick wall. One might call it "infinite". So thick that the back end of the wall never receives any heat in the time span that we are concerned with. Therefore, we approximate it as an infinitely thick wall. Additionally, we will assume the wall's area is very large (or that it's boundary is so far away from the area that we're concerned with) and therefore the face of the wall is infinite. And due to symmetry, we can restrict the problem to be in only 1 space dimension: perpendicular to the face of the wall. Our partial differential equations can be reduced to $$\frac{\partial^2 T}{\partial x^2}=\frac{1}{\alpha}\frac{\partial T}{\partial t}$$ This is second order in position and first order in time. Therefore we require two boundary conditions and an initial condition. The initial condition is simple: we will assume that the wall starts off uniform in temperature. $$T(x,0)=T_0$$ Next we have to come up with the boundary conditions. In heat transfer there are three primary surface conditions: constant temperature, constant flux, and convection. The choice of surface condition depends on the physical context that you want to model. However, for the purpose of this solution we will assume that the surface condition has a constant temperature. Meaning that after $t=0$ we instantly change the temperature of the face of the wall to $$T(0,t)=T_s$$ Assuming either of the other two surface conditions changes very little, but requires us to introduce new constants and more complex algebra. As for the last boundary condition, we have already hinted at its nature: the wall at infinity is the same as it is at time $0$. $$T(x\to\infty,t)=T_0$$ But now we have to solve the PDE. But what if we don't know how to solve PDEs? This is what I find fascinating about this problem; we can turn this PDE of two variables into an ODE of 1 variable with a similarity variable ^[1] $$\eta=\frac{x}{\sqrt{4\alpha t}}$$ We now try to get $\frac{\partial^2 T}{\partial x^2}$ and $\frac{\partial T}{\partial t}$ which are in terms of $T,x,t$ into terms of just $T$ and $\eta$. $$\begin{align*}\frac{\partial^2 T}{\partial x^2}&=\frac{\partial}{\partial x}\left(\frac{\partial T}{\partial x}\right)=\frac{\partial}{\partial x}\left(\frac{\partial T}{\partial \eta}\frac{\partial \eta}{\partial x}\right)=\frac{\partial}{\partial x}\left(\frac{\partial T}{\partial \eta}\frac{1}{\sqrt{4\alpha t}}\right)\\ &=\frac{1}{\sqrt{4\alpha t}}\frac{\partial}{\partial x}\left(\frac{\partial T}{\partial \eta}\right)=\frac{1}{\sqrt{4\alpha t}}\left(\frac{\partial^2 T}{\partial x\partial \eta}\right)=\frac{1}{\sqrt{4\alpha t}}\left(\frac{\partial^2 T}{\partial \eta\partial x}\right)\\ &=\frac{1}{\sqrt{4\alpha t}}\frac{\partial}{\partial \eta}\left(\frac{\partial T}{\partial x}\right)=\frac{1}{\sqrt{4\alpha t}}\frac{\partial}{\partial \eta}\left(\frac{\partial T}{\partial \eta}\frac{\partial \eta}{\partial x}\right)\\ &=\frac{1}{\sqrt{4\alpha t}}\frac{\partial}{\partial \eta}\left(\frac{\partial T}{\partial \eta}\frac{1}{\sqrt{4\alpha t}}\right)=\frac{1}{4\alpha t}\frac{\partial^2 T}{\partial \eta^2} \end{align*}$$ The above depends on the continuity of $T$ and the independence of $x$ and $t$. Both are very appropriate assumptions for our problem. Likewise we can show $$\frac{\partial T}{\partial t}=\frac{\partial T}{\partial \eta}\frac{\partial \eta}{\partial t}=-\frac{x}{2t\sqrt{4\alpha t}}\frac{\partial T}{\partial \eta}$$ Putting these back into our heat equation $$\frac{1}{4\alpha t}\frac{\partial^2 T}{\partial \eta^2}=-\frac{1}{\alpha}\frac{x}{2t\sqrt{4\alpha t}}\frac{\partial T}{\partial \eta}$$ which can simplify to $$\begin{align*} \frac{\partial^2 T}{\partial \eta^2}&=-\frac{x\sqrt{4\alpha t}}{2\alpha t}\frac{\partial T}{\partial \eta}=-\frac{x}{x}\frac{x\sqrt{4\alpha t}}{2\alpha t}\frac{\partial T}{\partial \eta}=-2\frac{\sqrt{4\alpha t}}{x}\frac{x^2}{4\alpha t}\frac{\partial T}{\partial \eta}\\ &=-2\frac{x}{\sqrt{4\alpha t}}\frac{\partial T}{\partial \eta}=-2\eta\frac{\partial T}{\partial \eta} \end{align*}$$ And since it's all in terms of $\eta$ we can turn the partials back into total derivatives $$\frac{\mathrm{d}^2 T}{\mathrm{d} \eta^2}=-2\eta\frac{\mathrm{d}T}{\mathrm{d}\eta}$$ Now we are left to convert our boundary conditions into $\eta$. Our surface boundary condition, the condition where $x=0$ clearly corresponds to the case where $\eta=0$ and therefore $T(0)=T_s$. And the two conditions, $t=0$ and $x\to\infty$ clearly corresponds to the case where $\eta\to\infty$. Therefore, $T(\eta\to\infty)=T_0$. Thus the three conditions on the PDE are now two conditions in the ODE. I'm not going to solve this ODE because it's a standard separation of variables solution. I was only interested in sharing the conversion from the PDE of two variables into the ODE of one variable. I will however, spoil the ending and share the solution: $$ \DeclareMathOperator\erf{erf} \frac{T(\eta)-T_s}{T_0-T_s}=\erf{\eta}$$ where $\erf{\eta}$ is the error function (Go figure!). However, we can always substitute back in $x$ and $t$ because $T(\eta)=T(x,t)$. Therefore, in conclusion $$\frac{T(x,t)-T_s}{T_0-T_s}=\erf{\left(\frac{x}{\sqrt{2\alpha t}}\right)}$$

[1] Incropera, Frank P. Fundamentals of Heat and Mass Transfer. Hoboken, NJ: Wiley, 2013. Print.

Friday, September 5, 2014

Alternative Definition for a Skew-Symmetric Matrix

I was working on some advanced solid mechanics homework and a problem asked to prove a particular identity for $2^{\text{nd}}$ order tensors. I really enjoyed the solution that I came up with. It's the type of proof that goes

Suppose some statement $A$ is true for all $x$. We want to show that statement $B$ is true. Consider the very special case where $x=(\text{randomnesspulled}$ $\text{outofthinairbutmakestheproblemverysimple})$... and the proof goes on very smoothly because of that particular choice of $x$.

Well, I finally managed to come up with one of those proofs! I believe that book only intended for the problem to be solved for the 3-dimensional case where writing out the explicit possibilities is very feasible and most likely the expected solution. I will share my (arguably prettier) solution. But I will translate it into terms of linear algebra, matrices, and vectors instead of tensors. But first some background.

Definition 1. A skew-symmetric matrix is a matrix $\mathbf{A}$ such that $\mathbf{A}=-\mathbf{A}^T$.

Where $^T$ denotes the transpose operator (i.e. flip along the main diagonal). We will use the notation $A_{ij}$ to denote the element of $\mathbf{A}$ in the $i^{\text{th}}$ row and $j^{\text{th}}$ column. Therefore, this definition can be written as $$A_{ij}=-A_{ji}$$ Here are a few examples of skew symmetric matrices: $$\left[\begin{matrix}0&1&1\\ -1&0&1\\ -1&-1&0\end{matrix}\right] \quad\left[\begin{matrix}0&1&-2&3\\ -1&0&-4&5\\ 2&4&0&6\\ -3&-5&-6&0\end{matrix}\right] \quad\left[\begin{matrix}0&a\\ -a&0\end{matrix}\right]$$ Note that all have $0$'s down the main diagonal because $0$ is the only number that solves $x=-x$. Now, the alternative definition to this type of matrix is:

Definition 2. For all vectors $\mathbf{u}$, $\mathbf{A}$ is skew-symmetric if $\mathbf{u}^T\mathbf{Au}=0$.

And to show that this definition is the same as the previous one, all we need to do it show the following theorem. Which is the problem in the book but in terms of matrices and vectors and not tensors (maybe another day!).

Theorem 1. For all vectors $\mathbf{u}$, $\mathbf{u}^T\mathbf{Au}=0$ if and only if $A_{ij}=-A_{ji}$.

Proof.
$(\Leftarrow)$ Assume that $A_{ij}=-A_{ji}$. We want to show that for all $\mathbf{u}$, $\mathbf{u}^T\mathbf{Au}=0$. Consider $\mathbf{u}^T\mathbf{Au}$. It can be shown that $$\mathbf{u}^T\mathbf{Au}=\sum_{i=1}^n\sum_{j=1}^nA_{ij}u_iu_j$$ where the vectors are of length $n$ and the matrix is of size $n\times n$. And since $A_{ij}=-A_{ji}$, we know that $$\sum_{i=1}^n\sum_{j=1}^nA_{ij}u_iu_j=\sum_{i=1}^n\sum_{j=1}^n-A_{ji}u_iu_j$$ But $i$ and $j$ are just dummy variable names and we can swap them, as long as we swap all of them. So: $$\sum_{i=1}^n\sum_{j=1}^nA_{ij}u_iu_j=\sum_{j=1}^n\sum_{i=1}^n-A_{ij}u_ju_i$$ But summation order doesn't matter, multiplication is commutative, and we can pull the negative out. $$\left(\sum_{i=1}^n\sum_{j=1}^nA_{ij}u_iu_j\right)=-\left(\sum_{i=1}^n\sum_{j=1}^nA_{ij}u_iu_j\right)$$ And, as stated above, the only solution to $x=-x$ is $0$. Thus $\mathbf{u}^T\mathbf{Au}=0$. That was the easy direction. Now for the fun one!
$(\Rightarrow)$ Assume $\mathbf{u}^T\mathbf{Au}=0$ for all $\mathbf{u}$. We want to show that $A_{ij}=-A_{ji}$. Going back to the summations we have $$\mathbf{u}^T\mathbf{Au}=\sum_{i=1}^n\sum_{j=1}^nA_{ij}u_iu_j=0$$ Now, consider the vectors $\mathbf{u}^{(r,s)}$ such that $u_i=\delta_{ri}+\delta_{si}$ where $\delta_{ij}$ is the Kronecker delta function. $$\delta_{ij}=\begin{cases}1&i=j\\ 0&i\ne j\end{cases}$$ Some examples of these vectors are $$\begin{align*} \mathbf{u}^{(1,1)}=\left[2,0,0\right]\\ \mathbf{u}^{(2,1)}=\left[1,1,0\right]\\ \mathbf{u}^{(3,1)}=\left[1,0,1\right] \end{align*}$$ Those were all examples in three dimensions, however, these can be in any number of dimensions. For instance this ten dimensional version $$\mathbf{u}^{(4,7)}=\left[0,0,0,1,0,0,1,0,0,0\right]$$ Basically, if $r=s$, then the vector has all zero components with the exception of the $r^{\text{th}}$ component which is $1+1=2$. Or if $r\ne s$ then the vector still has all zero components with the exception of the $r^{\text{th}}$ component which is $1+0=1$ and the $s^{\text{th}}$ component which is $0+1=1$.

Consider first, the case where $r=s$. We know that $u_i^{(r,r)}=2\delta_{ri}$ and thus $$0=\sum_{i=1}^n\sum_{j=1}^nA_{ij}u_iu_j=\sum_{i=1}^n\sum_{j=1}^nA_{ij}(2\delta_{ri})(2\delta_{rj})$$ Since $\delta_{ri}$ and $\delta_{rj}$ are zero unless $i=r$ and $j=r$, we can drop the summation and just use $i=j=r$ and the deltas become multiplications by $1$. $$0=A_{rr}(2)(1)(2)(1)=4A_{rr}$$ Thus $A_{ii}=0$ and we have zeros down the main diagonal. Step in the right direction!

Now consider the case where $r\ne s$. We know that $u_i^{(r,s)}=\delta_{ri}+\delta_{si}$. Thus $$0=\sum_{i=1}^n\sum_{j=1}^nA_{ij}u_iu_j=\sum_{i=1}^n\sum_{j=1}^nA_{ij}(\delta_{ri}+\delta_{si})(\delta_{rj}+\delta_{sj})$$ We can foil and distribute and get $$0=\sum_{i=1}^n\sum_{j=1}^n\left(A_{ij}\delta_{ri}\delta_{rj}+A_{ij}\delta_{ri}\delta_{sj}+A_{ij}\delta_{si}\delta_{rj}+A_{ij}\delta_{si}\delta_{sj}\right)$$ The summand is only non-zero for four cases: $(r=i=j)$, $(r=i,s=j)$, $(s=i,r=j)$, $(s=i=j)$. If one of these isn't true, then the summand is $0+0+0+0=0$. Thus, we can take just the four terms with those $i$ and $j$ values and drop the summation. $$\begin{align*} 0=&\left(A_{rr}\delta_{rr}\delta_{rr}+A_{rr}\delta_{rr}\delta_{sr}+A_{rr}\delta_{sr}\delta_{rr}+A_{rr}\delta_{sr}\delta_{sr}\right)+\\ &\left(A_{rs}\delta_{rr}\delta_{rs}+A_{rs}\delta_{rr}\delta_{ss}+A_{rs}\delta_{sr}\delta_{rs}+A_{rs}\delta_{sr}\delta_{ss}\right)+\\ &\left(A_{sr}\delta_{rs}\delta_{rr}+A_{sr}\delta_{rs}\delta_{sr}+A_{sr}\delta_{ss}\delta_{rr}+A_{sr}\delta_{ss}\delta_{sr}\right)+\\ &\left(A_{ss}\delta_{rs}\delta_{rs}+A_{ss}\delta_{rs}\delta_{ss}+A_{ss}\delta_{ss}\delta_{rs}+A_{ss}\delta_{ss}\delta_{ss}\right) \end{align*}$$ Now remember that $A_{ii}=0$ so the first and last lines are all $0$. And we can cross out any term with a $\delta_{rs}$ or $\delta_{sr}$. And we can replace any $\delta_{rr}$ or $\delta_{ss}$ with a multiplication by $1$. This leaves us with $$0=A_{rs}+A_{sr}$$ And this implies that $A_{ij}=-A_{ji}$.

I will say that this was both simpler and much prettier in Einstein tensor notation. But with the choice of $\mathbf{u}$ we had all but $4$ of $n^2$ terms cancel out in one step and the $14$ of $16$ terms cancel out in the second step. The remainder of the proof was easy!

Tuesday, August 19, 2014

The Geometric Mean is just the Arithmetic Mean of the Exponents

This one is going to be brief, but I find it interesting. A buddy of mine a few months back sent me a text about this realization: The geometric mean is just the arithmetic mean of the exponents! I'll explain in a second but just a quick background.

An arithmetic mean is what we commonly refer to as the average: add all the elements up and divide by the number of elements. $$\overline{x}=\frac{x_1+x_2+\cdots+x_n}{n}$$ Alternatively, the we have the geometric mean. Except the geometric mean is the multiplicative analog to the arithmetic mean. The geometric mean is the $n$^th root of the product of the elements. $$\overline{y}'=\left(y_1y_2\cdots y_n\right)^{\frac{1}{n}}$$ The connection is as follows. Each number $y_i$ (assuming they are all positive) can be expressed as $y_i=a^{b_i}$ for some $a,b\in\mathbb{R}$. Therefore we have the following $$\begin{align*} \overline{y}'&=\left(y_1y_2\cdots y_n\right)^{\frac{1}{n}}\\ &=\left(a^{b_1}a^{b_2}a^{b_2}\cdots a^{b_n}\right)^{\frac{1}{n}}\\ &=\left(a^{b_1+b_2+\cdots+b_n}\right)^{\frac{1}{n}}\\ &=a^{(b_1+b_2+\cdots+b_n)/n} \end{align*}$$ which states that if each $y_i$ can be expressed as $y_i=a^{b_i}$ for some $a\in\mathbb{R}$ and $b_i\in\mathbb{R}$ then the geometric mean of all $y_i$ is the base, $a$, raised to the arithmetic mean of all exponents $b_i$. $$\overline{y}'=a^{\overline{b}}$$ Pretty interesting, eh?

Saturday, July 12, 2014

Battle Mechanic Computations (Part 1)

The other day on the Mathematics StackExchange, I came across a question about a closed form solution for a certain set of battle mechanics. The problem intrigued me and I found some interesting results. Here's some of what I found.

The battle mechanics are as follows. For battling armies $X$ and $Y$, a battle consists of rounds of (what I will call) conflicts. Each conflict is a $1$ on $1$ interaction between a member of $X$ and a member of $Y$ with an equal chance of victory for both sides. The loser of a conflict is "dead" and the winner can compete in the next conflict. The last army with men is the victor.

It turns out that it's possible to quickly compute the probable outcome. Let $P(x,y)$ be the probably that a battle, between army $X$ with $x$ men and army $Y$ with $y$ men, results in a victory for army $X$. The battle mechanics can be summarized in three rules: $$P(x,0) = 1$$ $$P(0,y) = 0$$ $$P(x,y) = \frac{1}{2}(x,y-1) + \frac{1}{2}P(x-1,y)$$ The first two rules are pretty self explanatory; if army $Y$ has no men, army $X$ wins and visa versa. The third can be understood by looking at a conflict. A conflict has two results: army $X$ wins or army $Y$ wins. If army $X$ wins, then army $Y$ loses a man and the rest of the battle has the same odds as a battle of two armies of size $x$ and $y-1$, respectively (fatigue isn't considered, so the member of army $X$ in the first conflict is indistinguishable from any other member of army $X$ after that first conflict). Likewise a conflict win for army $Y$ results in the rest of the battle to have the odds as that of two armies of size $x-1$ and $y$. And since we have equal odds, there's a $50\%$ chance of the former and a $50\%$ chance of the latter.

I will show that $$P(x,y)=\frac{1}{2^{x+y-1}}\sum_{i=0}^{n-1}\dbinom{x+y-1}{i}$$ To start, let's start with armies of size $x$ and $y$ and start expanding with our third rule. Each of the below are equivalent. $$\frac{1}{1}\left[P(x,y)\right]$$ $$\frac{1}{2}\left[P(x,y-1) + P(x-1,y)\right]$$ $$\frac{1}{4}\left[P(x,y-2) + 2P(x-1,y-1)+P(x-2,y)\right]$$ Let's stop here and explain what we're doing. From the first line to the second is pretty obvious, we just invoked our rule and pulled out the common factor of $1/2$. The progression from the second line to the third line invokes the rule twice (one for each of the evaluations of $P$). The left $$P(x,y-1)=\frac{1}{2}\left[P(x,y-2)+P(x-1,y-1)\right]$$ and the right $$P(x-1,y)=\frac{1}{2}\left[P(x-1,y-1)+P(x-2,y)\right]$$ Putting them together involved pulling out the common factor of $1/2$ (and combining this with the already factored out $1/2$) and then combining both terms of the form $P(x-1,y-1)$. Let's proceed with the expansion, but for the sake of space, We will denote $P(x-a,y-b)=P_{a,b}$ $$\frac{1}{8}\left[P_{0,3} + 3P_{1,2}+3P_{2,1}+P_{3,0}\right]$$ $$\frac{1}{16}\left[P_{0,4} + 4P_{1,3}+6P_{2,2}+4P_{3,1}+P_{4,0}\right]$$ And by now, it should be strikingly apparent that the coefficient are rows of Pascals triangle and we can rewrite $P(x,y)$ as $$P(x,y)=\frac{1}{2^j}\sum_{i=0}^j\dbinom{j}{i}P(x-i,y-j+i)$$ where $j$ is the number of conflicts so far. However, there is a bound on the number of conflicts. For instance, the battle can't be over until at least $\min(x,y)$ conflicts have taken place (otherwise, no one is out of men). And as an upper bound, no more than $x+y-1$ conflicts can happen. If $x+y$ conflicts take place, then a total of $x+y$ men are dead and both armies are dead. Therefore, we take one less conflict and this lone man is the victor! With this upper bound, we can write $$\begin{align*}P(x,y)=&\frac{1}{2^{x+y-1}}\sum_{i=0}^{x+y-1}\dbinom{x+y-1}{i}P(x-i,y-(x+y-1)+i)\\ &=\frac{1}{2^{x+y-1}}\sum_{i=0}^{x+y-1}\dbinom{x+y-1}{i}P(x-i,1+i-x)\end{align*}$$ However, for all $i\geq x$ we have $P(0,1+i-x)=0$ so we can ignore any $i\geq x$. $$P(x,y)=\frac{1}{2^{x+y-1}}\sum_{i=0}^{x-1}\dbinom{x+y-1}{i}P(x-i,1+i-x)$$ However, we can see the for $i\leq x-1$, we have $P(x-i,0)=1$. Therefore we have removed the dependency on $P$ on the RHS. $$P(x,y)=\frac{1}{2^{x+y-1}}\sum_{i=0}^{x-1}\dbinom{x+y-1}{i}$$ We have shown what I've wanted to show here. This equation, in words, is the probability that army $X$ with $x$ soldiers wins over army $Y$ with $y$ soldiers is the sum of the first $x$ elements in the $(x+y-1)$^th row of Pascal's triangle divided by the total sum of the $(x+y-1)$^th row of Pascal's triangle! I will discuss the properties and shortcuts to this equation in the next part. Just as a preview: $$P(x,x) = \frac{1}{2}$$ $$P(x,y) + P(y,x) = 1$$ $$P(x,1) = \frac{2^x-1}{2^x}$$ $$P(x,2) = \frac{2^{x+1}-x-2}{2^{x+1}}$$ $$P(x,3) = \frac{2^{x+3}-x^2-5x-8}{2^{x+3}}$$

Monday, June 23, 2014

Sum of Three Squares and Factors of Two

Background

I've been working a lot recently with the sums of squares (I have a previous post about finding the Exhaustive List of the Sums of Four-Squares). This work has been for some research and also for Project Euler. However, I came across an interesting result: If $n$ is even and $a^2 + b^2 + c^2 = n^2$, then $a,b,$ and $c$ are also even.

To kind of tip-toe around the issue, we know that there is at least one solution $a,b,c$ such that this is true: $a=n, b=c=0$. Others may and do exist, but we won't fret too much right now about what they are. All we need to know is that there is at least one solution. Since all natural numbers are either even or odd, we have four possibilities: all $a,b,c$ are odd, two are odd and one is even, one is odd and two are even, or all three are even.

We can easily rule out the case for which one is odd and three are odd. Because we know $n$ is even, $n^2$ must also be even. However, there are an odd number of odds on the other side (remember an odd squared is still odd), which means the entire other side is odd. And, since an odd cannot equal an even, we have a contradiction. However, in order to eliminate the case where two are odd, we must do a little more work.

Assume, without loss of generality, that $a$ and $b$ are odd and $c$ is even and we will prove a contradiction. Let $d,e,f,g\in\mathbb{N}$ be such that $$\begin{array}{c} a = 2d + 1 \\ b = 2e + 1 \\ c = 2f \\ n = 2g \end{array} $$ We know that. $$n^2 = (2g)^2 = 4g^2$$ And we know that $$\begin{array}{r l} n^2 &= a^2 + b^2 + c^2 \\ &= (2d+1)^2 + (2e+1)^2 + (2f)^2 \\ & = 4d^2 + 4d + 1 + 4e^2 + 4e + 1 + 4f^2 \\ &= 4(d^2 + d + e^2 + e + f^2) + 2 \end{array}$$ Putting the two together: $$4g^2 = 4(d^2 + d + e^2 + e + f^2) + 2$$ Clearly, the left hand side is a multiple of $4$, however, the righthand side is not. Therefore, these two cannot be equal and we cannot have two odd squares to sum. We won't try to eliminate the last possibility (that all three be even) because we know that at least one solution exists and it has to fall into one of those categories.

Now, this probably doesn't seem like that interesting of a result. And honestly speaking, it's really not particularly interesting. However, we can generalizes this:

Theorem

If $2^k\vert n$ for some $k\in\mathbb{N}$ and $a^2 + b^2 + c^2 = n^2$ for $a,b,c\in \mathbb{N}$, then $2^k\vert a,b,c$.

Proof

For $k=1$, this simplifies to the above proof and $k=0$ is trivial. As such, we have base cases and we will proceed by induction. So, we assume that if $2^{k-1}\vert n$ for some $k\in \mathbb{N}$ and $a^2 + b^2 + c^2 = n^2$, then $2^{k-1}\vert a,b,c$. Now assume that $2^k\vert n$ and that $a^2 + b^2 + c^2 = n^2$. We know that there exists $m$ such that $n=2^km$. We also can show that $$n = 2^km = 2^{k-1}(2m)$$ which means that $2^{k-1}\vert n$. And, since $a^2 + b^2 + c^2 = n^2$ by our induction hypothesis, we can conclude that $2^{k-1}\vert a,b,c$. This implies there exists $i,j,l\in \mathbb{N}$ such that $$\begin{array}{c} a = 2^{k-1}i \\ b = 2^{k-1}j \\ c = 2^{k-1}l \\ \end{array}$$ We can now classify $a,b,c$ based on the respective parity of $i,j,l$ (because again, $i,j,l$ must either be even or odd). We, again, have four possibilities: all three are odd, two are odd and one is even, one is odd and two are even, and all three are even.

Case 1

Assume that all three are odd. Therefore, there exists $i_0, j_0, l_0$ such that $$\begin{array}{c} i = 2i_0 + 1 \\ j = 2j_0 + 1 \\ l = 2l_0 + 1 \end{array}$$ and furthermore $$\begin{array}{c} a = 2^{k-1}(2i_0 + 1) = 2^ki_0 + 2^{k-1} \\ b = 2^{k-1}(2j_0 + 1) = 2^kj_0 + 2^{k-1} \\ c = 2^{k-1}(2l_0 + 1) = 2^kl_0 + 2^{k-1} \\ \end{array}$$ Consider $$\begin{array}{r l} n^2 &= a^2 + b^2 + c^2 \\ & = (2^ki_0 + 2^{k-1})^2 + (2^kj_0 + 2^{k-1})^2 + (2^kl_0 + 2^{k-1})^2 \\ &= 2^{2k}(i_0^2+i_0+j_0^2+j_0+l_0^2+l_0) + 3(2^{2k-2}) \\ \end{array}$$ Note that this is not divisible by $2^{2k}$ because of the last term. On the contrary, note that $$n^2 = (2^km)^2 = 2^{2k}m^2$$ and therefore $2^{2k}\vert n^2$; a contradiction. Therefore, all three cannot be odd.

Case 2

Assume that two are odd and one is even. Therefore there exists $i_0,j_0, l_0$ such that $$\begin{array}{c} i = 2i_0 + 1 \\ j = 2j_0 + 1 \\ l = 2l_0 \end{array}$$ As before, we can substitute and write $a,b,c$ in terms of $i_0, j_0,l_0$ and as a result $n^2$ in terms of $i_0, j_0,l_0$. $$n^2 = 2^{2k}(i_0^2+i_0+j_0^2+j_0+l_0^2) + 2(2^{2k-2})$$ Note that this is not divisible by $2^{2k}$ because of the last term only having a factor of $2^{2k-1}$. But again, because know that $2^{2k}\vert n^2$ we have a contradiction. Therefore, two cannot be odd.

Case 3

Assume that one is odd and two are even. Therefore there exists $i_0,j_0, l_0$ such that $$\begin{array}{c} i = 2i_0 + 1 \\ j = 2j_0 \\ l = 2l_0 \end{array}$$ Once again, we can substitute and write $n^2$ in terms of $i_0, j_0,l_0$. $$n^2 = 2^{2k}(i_0^2+i_0+j_0^2+l_0^2) + 2^{2k-2}$$ Note that this is not divisible by $2^{2k}$ because of the last term. And again, because know that $2^{2k}\vert n^2$ we have a contradiction. Therefore, one cannot be odd.

Case 4

That only leaves the option that all three are even. Since $i,j,k$ are even, and we have $i_0,j_0, l_0$ such that $$\begin{array}{c} i = 2i_0 \\ j = 2j_0 \\ l = 2l_0 \end{array}$$ we can now write $$\begin{array}{c} a = 2^{k-1}(2i_0) = 2^ki_0 \\ b = 2^{k-1}(2j_0) = 2^kj_0 \\ c = 2^{k-1}(2l_0) = 2^kl_0 \\ \end{array}$$ and we have shown that $2^k\vert a,b,c$!

Implications

What does this mean, though? It means that if we want to find the three integers whose squares sum to $n^2$(such as lattice points on a sphere), then we can omit multiples of two from our search and work with smaller numbers. This saves a great deal of computing time.

For instance, if we know that $2^k\vert n$ and we want to find $a,b,c$ such that $n^2 = a^2+b^2+c^2$, we can really just search for $d,e,f$ such that $m^2=d^2+e^2+f^2$ ($n = 2^km$). We jump back to the original problem by multiplying by $2^{2k}$ on both sides: $$\begin{array}{r l} m^2 & = d^2 + e^2 + f^2\\ 2^{2k}m^2 & = 2^{2k}(d^2 + e^2 + f^2) \\ (2^km)^2&= 2^{2k}d^2 + 2^{2k}e^2 + 2^{2k}f^2 \\ n^2 & = (2^kd)^2 + (2^ke)^2 + (2^kf)^2\\ n^2 & = a^2 + b^2 + c^2 \end{array} $$ Where clearly $d,e,f$ is the respective result of the division of $a,b,c$ by $2^k$.

Monday, June 9, 2014

2048 - Scoring (Part 2)

In Part 1 we talk about how to estimate the score of your game based on a single snapshot. However, we hadn't yet corrected for the fact that $4$'s sometimes spawn. So, to compensate for the number of $4$'s that spawn, we need to know how often they spawn! The online, open-source version has a $10\%$ chance of spawning a $4$, but based on nearly $38,000$ (No, I didn't count them all, I'll show you how I came about this in a little bit.) spawns on my iPhone app version, there is about a $15.3\%$ chance that a $4$ will spawn (leading me to believe that it's supposed to be a $15\%$ chance).

With that in mind, we need to have an estimate for the total number of tiles spawned. And under our old assumption (that only $2$'s spawn), we can compute that very easily. Consider the following. Every $2$ tile that spawns counts as a spawn. Therefore, every $4$ tile that we have is really two total tiles that spawned (the two $2$ tiles). The $8$'s are 4 spawns (two $4$ tiles each with two $2$ tiles spawning). It doesn't take long to figure out that each tiles has $$c_a = a/2$$ where $a$ is the face value of the tile. Thus we just sum this for each of the $16$ tiles on the board and we have a good estimate for the number of total tiles that spawned, $C_e$ $$C_e = \sum_{16\;Tiles}c_a$$ Well, we just assumed that all the spawns were number $2$ tiles. To compensate for the fact that $4$ tiles can spawn at a probability of $p_4$ we know the following two relationships: $$p_4 = \frac{C_4}{C_a}$$ $$C_a = C_e - C_4$$ where $C_4, C_a, $ and $C_e$ are the number of $4$ tiles spawned, the actual number of total tiles spawned, and the estimated total tiles spawned, respectively. The first relationship is fairly straight forward; it's how the probability of a four-spawn is defined. However, the second relationship involves a little more thought. We guessed, based on only $2$ tiles spawning that there are $C_e$ total spawns. If we actually know that there are $C_4$ four-spawns, then we have to account for the number of two-spawns that would've smashed into these $4$'s. So, since there are two $2$'s that make a $4$ we have $C_2 = C_e - 2C_4$ two-spawns. To get the actual number of spawns we add the two-spawns to the four-spawns $$C_a = C_2 + C_4 = (C_e - 2C_4) + C_4 = C_e - C_4$$ Plugging the second equation into the first and solving for $C_4$ we find out that $$C_4 = \frac{p_4C_e}{1+p_4}$$ Lastly, since each of these $4$'s that spawned does not give us 4 points (we assumed we got the points for all $4$ tiles), our expected score ($S_x$) is the estimate score ($S_e$) minus four times the number of $4$'s that spawned: $$S_x = S_e - 4C_4 = S_e - 4\frac{p_4C_e}{1+p_4}$$ Since we know $p_4$ and we know how to find $S_e$ (See Part 1) and $C_e$, we can compute our expected score. I played approximately 30 games with this and the average relative error for these guesses is about $0.2\%$. The results were even better once you scored above $10,000$. Only averaging the games with scores higher than $10,000$ (which was all but four of them), the average relative error was about $0.1\%$. Pretty cool results!

I said I would tell you how I came about the probability of a four-spawn of $15.3\%$. So I played a bunch of games and compared the actual score of the game to what I estimated it would be if there were only $2$ tiles that spawned. The difference between these two divided by four is the actual number of $4$ tiles that spawned. With that I used the two equations above and calculated what $p_4$ was!

There are a few more questions about 2048 that I still have. What is the highest score one can get? What is the lowest score? These questions (and others if I think of more) will be discussed in Part 3.

Friday, May 30, 2014

Exhaustive List of the Sums of Four-Squares

I'd recently been asked to assist on some research on equilateral triangles in $\mathbb{Z}^4$. The project required coming up with a list of all of the lattice points $(i,j,k,l)$ on a hypersphere of radius $\sqrt{d}$ for $d\in\mathbb{N}$. Basically, this reduces to the problem of finding all sets of four integers $(i,j,k,l)$ such that $i^2+j^2+k^2+l^2=d$.

This is an old problem and in 1770 Lagrange proved that every natural number $d$ can be represented as the sum of four squares^[1]. So we know that we should always find at least one. Additionally, in 1834 Jacobi determined how many solutions exist^[2]: $$N(d)=8\sum_{4\nmid m|d}m$$ We can use that to check to see if our program can find all the points (at least, the right number of points). Of course for the smaller values of $d$, a complete check on the lattice can be performed from $-\sqrt{d}$ to $\sqrt{d}$ in all four cardinal directions. However, this has a complexity of $O(d^2)$.

from math import sqrt

def sum_squares(d):

   sqrt_d = int(sqrt(d))

   ## This list will hold all of the points
   points = []

   ## Nested for loops to check
   ## Add 1 because of the exclusivity of the bound of range() function
   for i in range(-sqrt_d,sqrt_d + 1):
      for j in range(-sqrt_d,sqrt_d + 1):
         for k in range(-sqrt_d,sqrt_d + 1):
            for l in range(-sqrt_d,sqrt_d + 1):
               if i**2 + j**2 + k**2 + l**2 == d:
                  points.append([i,j,k,l])

   return points

However, this gets really slow really quickly! Especially if I wanted to run this for $d$ in the tens of thousands. Better results can be obtained when we realize that our $l$ loop is completely unnecessary. Since we would already have a guesses for $i, j,$ and $k$ and we know $d$, we can solve for $l$ and check to see if it is an integer. This reduces complexity to $O(d^{1.5})$. We have to do a few things to make sure that the sqrt() function only takes a positive number and that we include both the positive and negative root, but the updated code is as follows.

from math import sqrt

def sum_squares(d):

   ## This list will hold all of the points
   points = []

   ## Nested for loops to check
   ## Add 1 because of the exclusivity of the bound of range() function
   for i in range(-sqrt_d,sqrt_d + 1):
      for j in range(-sqrt_d,sqrt_d + 1):
         for k in range(-sqrt_d,sqrt_d + 1):
            l_squared = d - i**2 - j**2 - k**2
            if l_squared >= 0:
               l = sqrt(l_squared)
               if l.is_integer():
                  points.append([i,j,k,int(l)])
                  if l != 0:
                     points.append([i,j,k,-int(l)])

   return points

Even better results can be obtained if you only check the first orthant and permute the sign of the results into the other 15 orthants. This will run in slightly more than $1/16$ the amount of time, but will still have the same complexity. However, depending on how many first orthant points we find, the permutation can take a significant amount of time and I will address the permutations more in a different post. Looking in the first orthant only, we eliminate the negative root for $l$ in the last line of the nested loops.

from math import sqrt

## This list contains the 16 sign permutations
signs = [[ 1, 1, 1, 1], [ 1, 1, 1,-1], [ 1, 1,-1, 1], [ 1, 1,-1,-1],
         [ 1,-1, 1, 1], [ 1,-1, 1,-1], [ 1,-1,-1, 1], [ 1,-1,-1,-1],
         [-1, 1, 1, 1], [-1, 1, 1,-1], [-1, 1,-1, 1], [-1, 1,-1,-1],
         [-1,-1, 1, 1], [-1,-1, 1,-1], [-1,-1,-1, 1], [-1,-1,-1,-1]]

def sum_squares(d):

   ## Add 1 because of the exclusivity of the bound of range() function
   sqrt_d = int(sqrt(d)) + 1

   ## This list will hold the points in the first orthant
   points = []

   ## This list will hold the points from all orthants
   final = []

   ## Nested for loops to check
   for i in range(sqrt_d):
      for j in range(sqrt_d):
         for k in range(sqrt_d):
            l_squared = d - i**2 - j**2 - k**2
            if l_squared >= 0:
               l = sqrt(l_squared)
               if l.is_integer():
                  points.append([i,j,k,int(l)])

   ## For each point found in the first orthant
   for m in range(len(points)):

      ## This list will hold all sign permutations for the mth point
      ## in the first orthant
      point_signs = []

      ## For each of the possible sign permutations
      for i in range(len(signs)):
         tmp = [points[m][0] * signs[i][0], points[m][1] * signs[i][1],
                points[m][2] * signs[i][2], points[m][3] * signs[i][3]]

         ## To avoid errors with points on the axes
         ## i.e. 0 * 1 = 0 * -1
         if tmp not in point_signs:
            point_signs.append(tmp)

      final = final + point_signs

   return final

However, this is still has the same complexity. For larger values of $d$ something different must be done to reduce complexity! One such way to reduce complexity is to search for only the basic representation of the lattice points in the first orthant. The basic representation is the representation $(i,j,k,l)$ such that $i\geq j\geq k\geq l \geq 0$. For example, rather than finding $\{(2,1,1,0), (2,1,0,1), (2,0,1,1), \dots\}$, we only need to find $(2,1,1,0)$ and permute the order to produce the rest.

This further restricts the bounds on our for loops. For instance, we know that $\sqrt{d/4} \leq i \leq \sqrt{d}$. The upper bound is the same as before, however we have now bounded $i$ from below! To see why, we set $i=j=k=l=i_{min}$. This is $i$ at it's lowest possible value and $j,k,$ and $l$ at their largest. This yields $$d=i^2+j^2+k^2+l^2 = 4{i_{min}}^2\implies i_{min} = \sqrt{d/4} $$ Each of the subsequent indices are then bounded above by both the previous index and the difference between d and the sum of the squares of all the previous indices. For instance, $k \leq j$ and $k \leq d - (i^2 + j^2)$. A stronger statement is therefore $k \leq \mathrm{min}(j, d - (i^2 + j^2))$.

from math import sqrt

signs = [[ 1, 1, 1, 1], [ 1, 1, 1,-1], [ 1, 1,-1, 1], [ 1, 1,-1,-1],
         [ 1,-1, 1, 1], [ 1,-1, 1,-1], [ 1,-1,-1, 1], [ 1,-1,-1,-1],
         [-1, 1, 1, 1], [-1, 1, 1,-1], [-1, 1,-1, 1], [-1, 1,-1,-1],
         [-1,-1, 1, 1], [-1,-1, 1,-1], [-1,-1,-1, 1], [-1,-1,-1,-1]]

orders = [[0,1,2,3], [0,1,3,2], [0,2,1,3], [0,2,3,1], [0,3,1,2], [0,3,2,1],
          [1,0,2,3], [1,0,3,2], [1,2,0,3], [1,2,3,0], [1,3,0,2], [1,3,2,0],
          [2,0,1,3], [2,0,3,1], [2,1,0,3], [2,1,3,0], [2,3,0,1], [2,3,1,0],
          [3,0,1,2], [3,0,2,1], [3,1,0,2], [3,1,2,0], [3,2,0,1], [3,2,1,0]]

def sum_squares(d):
   
   ## Add 1 because of the exclusivity of the bound of range() function
   sqrt_d = int(sqrt(d)) + 1
   sqrt_d4 = int(sqrt(d/4))
   
   ## This list will hold the basic points in the first orthant
   points = []

   ## This list will hold the points from all orthants
   final = []
   
   ## Nested for loops to check
   for i in range(sqrt_d4, sqrt_d):
      sum_i = d - i**2
      for j in range(min(i,int(sqrt(sum_i)))+1):
         sum_j = sum_i - j**2
         for k in range(min(j,int(sqrt(sum_j)))+1):
            sum_k = sum_j - k**2
            l = int(sqrt(sum_k))
            if l**2 == sum_k and l <= k:
               points.append([i,j,k,l])

   ## For each basic point found in the first orthant
   for m in range(len(points)):
      ## This list will hold all order permutations for the
      ## ith sign permutation
      point_orders = []

      ## For each of the possible order permutations
      for n in range(len(orders)):
         tmp = [points[m][orders[n][0]],
                points[m][orders[n][1]],
                points[m][orders[n][2]],
                points[m][orders[n][3]]]

         ## To avoid duplicates who have the same value in two directions
         ## i.e. [2,1,1,0] = [2,1,1,0]
         if tmp not in point_orders:
            point_orders.append(tmp)

      for n in range(len(point_orders)):
         
         ## This list will hold all sign permutations for the mth point
         ## in the first orthant
         point_signs = []
 
         ## For each of the possible sign permutations
         for o in range(len(signs)):
            tmp = [point_orders[n][0] * signs[o][0],
                   point_orders[n][1] * signs[o][1],
                   point_orders[n][2] * signs[o][2],
                   point_orders[n][3] * signs[o][3]]
 
            ## To avoid duplicates with points on the axes
            ## i.e. 0 * 1 = 0 * -1
            if tmp not in point_signs:
               point_signs.append(tmp)

         final = final + point_signs
            
   return final

But this is still pretty much the same complexity. I still wanted to get something faster. I did some research online and was unable to find anything. The closest I came was an article in Communications on Pure and Applied Mathematics by Rabin and Shallit^[3] that used randomized algorithms to produce a single lattice point such that $x^2+y^2+z^2+w^2=d^2$. While fast (complexity of $O(\log^2 d)$), this method produces only one lattice points and there is no way to guarantee which point you get.

Finally, I stumbled upon a post on the Computer Science Stack Exchange^[4] where a user suggested this nearly $O(d)$ algorithm. The way this algorithm works is that is populates a list that is $d+1$ long with pairs of numbers whose squares sum to the index of that particular element. So for the $0$^th index we will have the pair $[0,0]$ because $0^2+0^2=0$, for the $5$^th index we will have the pair $[2,1]$ because $2^2+1^2=5$, and for the $25$^th index we will have the pairs $[5,0]$ and $[4,3]$ because $5^2+0^2=4^2+3^2=25$. Now, we know that the pairs in the $0$^th location and the pairs in the $d$^th location will have a four-square sum of $d$ and the pairs in the $5$^th location and the pairs in the $d-5$^th location will have a four-square sum of $d$. We do a meet in the middle loop through this list and we have our basic representations!

For example, consider $d=6$. We have the following list of pairs which we can generation in $O(d)$ time. $$\begin{array}{|c|c|c|c|c|c|c|c|} \hline Index & 0 & 1 & 2 & 3 & 4 & 5 & 6 \\ \hline Pairs & [0,0] & [1,0] & [1,1] & \phantom{[0,0]} & [2,0] & [2,1] & \phantom{[0,0]}\\ \hline \end{array}$$ Now look at the pair in index $1$, $[1,0]$ and the pair in index $5$, $[2,1]$. Because $1+5=6$, we know the sum of the squares of all four of the numbers in these two pairs will sum to $6$: $2^2+1^2+1^2+0^2=6$. A wonderfully constructed list! The code for this method is below.

from math import sqrt

signs = [[ 1, 1, 1, 1], [ 1, 1, 1,-1], [ 1, 1,-1, 1], [ 1, 1,-1,-1],
         [ 1,-1, 1, 1], [ 1,-1, 1,-1], [ 1,-1,-1, 1], [ 1,-1,-1,-1],
         [-1, 1, 1, 1], [-1, 1, 1,-1], [-1, 1,-1, 1], [-1, 1,-1,-1],
         [-1,-1, 1, 1], [-1,-1, 1,-1], [-1,-1,-1, 1], [-1,-1,-1,-1]]

orders = [[0,1,2,3], [0,1,3,2], [0,2,1,3], [0,2,3,1], [0,3,1,2], [0,3,2,1],
          [1,0,2,3], [1,0,3,2], [1,2,0,3], [1,2,3,0], [1,3,0,2], [1,3,2,0],
          [2,0,1,3], [2,0,3,1], [2,1,0,3], [2,1,3,0], [2,3,0,1], [2,3,1,0],
          [3,0,1,2], [3,0,2,1], [3,1,0,2], [3,1,2,0], [3,2,0,1], [3,2,1,0]]

def sum_squares(d):
   ## Element i of sum_pairs will be populated with all pairs [a,b]
   ## such that a, b < sqrt(d) and a*a + b*b = i
   sum_pairs = [[] for i in range(d+1)]

   ## Fill sum_pairs
   for a in range(0,int(sqrt(d)) + 1):
      for b in range(0,min(a + 1, int(sqrt(d - a**2)) + 1)):
         i = a*a + b*b
         if i <= d:
            sum_pairs[i].append([a,b])
         else:
            break

   ## This list will be populated with all points [i,j,k,l]
   ## such that i >= j >= k >= l >= 0 whose squares sum to d
   points = []

   ## This list will hold the points from all orthants
   final = []
   
   for i in range(0,d//2 + 1):
      for j in range(0,len(sum_pairs[i])):
         for k in range(0,len(sum_pairs[d-i])):
            tmp = [sum_pairs[i][j][0], sum_pairs[i][j][1],
                   sum_pairs[d-i][k][0], sum_pairs[d-i][k][1]]
            tmp.sort()
            if tmp not in points:
               points.append(tmp)

   ## For each basic point found in the first orthant
   for m in range(len(points)):
      ## This list will hold all order permutations for the
      ## ith sign permutation
      point_orders = []

      ## For each of the possible order permutations
      for n in range(len(orders)):
         tmp = [points[m][orders[n][0]],
                points[m][orders[n][1]],
                points[m][orders[n][2]],
                points[m][orders[n][3]]]

         ## To avoid duplicates who have the same value in two directions
         ## i.e. [2,1,1,0] = [2,1,1,0]
         if tmp not in point_orders:
            point_orders.append(tmp)

      for n in range(len(point_orders)):
         
         ## This list will hold all sign permutations for the mth point
         ## in the first orthant
         point_signs = []
 
         ## For each of the possible sign permutations
         for o in range(len(signs)):
            tmp = [point_orders[n][0] * signs[o][0],
                   point_orders[n][1] * signs[o][1],
                   point_orders[n][2] * signs[o][2],
                   point_orders[n][3] * signs[o][3]]
 
            ## To avoid duplicates with points on the axes
            ## i.e. 0 * 1 = 0 * -1
            if tmp not in point_signs:
               point_signs.append(tmp)

         final = final + point_signs
            
   return final

Ultimately, we've found a very quick way to find all lattice points on the hypersphere of radius $\sqrt{d}$. For the methods that we had to permute (either sign or position), most of the computation time is spent permuting. Therefore, I will spend some time working on a faster way to permute these basic representations while being mindful of repeats. Those methods will be discussed in another post.
[1] https://en.wikipedia.org/wiki/Lagrange%27s_four-square_theorem
[2] https://en.wikipedia.org/wiki/Jacobi%27s_four-square_theorem
[3] M.O. Rabin, J.O. Shallit, Randomized Algorithms in Number Theory, Communications on Pure and Applied Mathematics (1986), no. S1, pp S239-S256.
[4]http://cs.stackexchange.com/questions/2988/how-fast-can-we-find-all-four-square-combinations-that-sum-to-n

Wrong Cancellation Gone Right

I was working on a Project Euler problem a few years back. The problem is essentially asking for you to find all double digit fractions under $1$ where you cancel the same number on the top and bottom. Such as $$\frac{49}{98}=\frac{4\cancel{9}}{\cancel{9}8}=\frac{4}{8}$$ Of course this is an incorrect method, but curiously, the equality remains. I was able to easily solve the problem with brute force. It's not a particularly computationally intensive problem. But when I shared the fun problem and results with a roommate of mine, he posed an interesting follow-up question. Can we extend this cancellation any further? And basically, we found that $$\frac{4}{8}=\frac{49}{98}=\frac{499}{998}=\frac{4999}{9998}=\cdots$$ And it turns out that this worked for any number of $9$'s in that location and for any of the two digit fractions that were found. So, we generalize the problem:

For $a,b,c\in\mathbb{N}$ such that $0\leq a,b,c\leq 9$ $$\frac{a}{b}=\frac{10\cdot a+c}{10\cdot c + b} \implies \frac{a}{b}=\frac{10^n\cdot a + 10^{n-1}\cdot c + \cdots + 10 \cdot c +c}{10^n \cdot c + 10^{n-1}\cdot c + \cdots + 10\cdot c + b}$$ Or, in sigma notation $$\frac{a}{b}=\frac{10^n\cdot a + c\cdot \sum_{i=0}^{n-1}{10^i}}{c\cdot \sum_{i=1}^{n}{10^i} + b}$$ Through algebra (cross multiply, group and divide), our premise implies that $$c=\frac{9ab}{10a-b}$$ So, consider the right hand side of the conclusion and substitute the above equality for $c$. $$\frac{10^n\cdot a + c\cdot \sum_{i=0}^{n-1}{10^i}}{c\cdot \sum_{i=1}^{n}{10^i} + b}=\frac{10^n\cdot a + \frac{9ab}{10a-b}\cdot\sum_{i=0}^{n-1}{10^i}}{\frac{9ab}{10a-b}\cdot \sum_{i=1}^{n}{10^i} + b}$$ Multiply top and bottom by $10a-b$ and get $$\frac{\left(10a-b\right)10^n\cdot a + 9ab\cdot\sum_{i=0}^{n-1}{10^i}}{9ab\cdot \sum_{i=1}^{n}{10^i} + \left(10a-b\right)b}$$ Distribute into the parenthesis $$\frac{10^{n+1}\cdot a^2 - 10^n\cdot a b + 9ab\cdot \sum_{i=0}^{n-1}{10^i}}{9ab\cdot\sum_{i=1}^{n}{10^i}+10ab-b^2}$$ Factor out $-ab$ from two terms on the top and $ab$ from two terms on the bottom. $$\frac{10^{n+1}\cdot a^2 - ab\left[10^n - 9\cdot \sum_{i=0}^{n-1}{\left(10^i\right)}\right]}{ab\left[9\cdot\sum_{i=1}^{n}{\left(10^i\right)}+10\right]-b^2}$$ Looking at the top brace, we have $n$ nines subtracted from the number $1$ followed by $n$ zeros, which is just $1$. For example $$\begin{array}{c} 10-9=1\\ 100-99=1\\ 1000-999=1\\ \vdots \end{array}$$ Then looking at the bottom brace, we have $$9\cdot\sum_{i=1}^{n}{\left(10^i\right)}+10=9\cdot\sum_{i=1}^{n}{\left(10^i\right)}+9+1=9\cdot\sum_{i=0}^{n}{\left(10^i\right)}+1$$ which is now $n+1$ nines added to $1$ which will be a $1$ followed by $n+1$ zeros or $10^{n+1}$. For example $$\begin{array}{c} 9+1=10\\ 99+1=100\\ 999+1=1000\\ \vdots \end{array}$$ Substituting these in we get $$\frac{10^{n+1}\cdot a^2 -ab}{ab\cdot 10^{n+1} - b^2}$$ Finally, factor out $a$ on top and $b$ on the bottom and cancel the rest to get $$\frac{a\left(10^{n+1}\cdot a - b\right)}{b\left(10^{n+1}\cdot a - b\right)}=\frac{a}{b}$$ Therefore, if we can cancel the digits in the (incorrect) manner we did, we can add as many of this number as we'd like and still maintain equality.

2048 - Scoring (Part 1)

As everyone probably knows, the game $2048$ has become extremely popular (Even my mother plays it!). The game is simple; smash tiles of the same number together to make the next power of two with the goal of reaching the $2048$ tile.

This will be one of many posts I make about this game because I am fascinated with some of the math behind it. One of the first questions I want to answer is: How is the game scored? Well, anyone watching the score as they play can see that when you make the $4$ tile, you get $4$ points. When you make the $1024$ tile, you get $1024$ points. Pretty straight forward. But, in order to make the $1024$ tile (and get your $1024$ points) you need to first make two of the $512$ tiles and get $512 + 512 = 1024$ more points. And to make pair of $512$ tiles, you need to make a total of four 256 tiles.

So, the next question is: What is the actual value for a tile? Basically, how many points have you actually gotten from any given tile? I will first make the assumption that only $2$'s will spawn and will address the incorrectness at a later point. This means that every $4$ tile in play was created by crashing two $2$ tiles and has given us $4$ points. With $s_a$ representing the score of the tile with a face value $a$, we have the recurrence relation $$ s_a=a+2s_{a/2} $$ because we get the value of the tile we create ($a$) and the score of two of the previous tiles ($a/2$). With this relation and the fact that $s_2 = 0$ (because it spawned giving no points) we can tabulate our relation: $$ \begin{array}{|c|c|c|} \hline \mathrm{\mathbf{a}} & \mathrm{\mathbf{Relation}} & \mathrm{\mathbf{s_a}}\\ \hline 2 & 0 & 0\\ 4 & 4+2(0) & 4\\ 8 & 8 + 2(4) & 16\\ 16 & 16 + 2(16) & 48\\ 32 & 32 + 2(48) & 128\\ 64 & 64 + 2(128) & 320\\ 128 & 128 + 2(320) & 768\\ \hline \end{array} $$ We can keep going, but I wanted a better way. Is there a closed form solution for $s_a$? Can we find $s_a$ without finding the score of every tile before it? I was looking for a pattern and came across this: $$ \begin{array}{|c|c|c|} \hline \mathrm{\mathbf{a}} & \mathrm{\mathbf{s_a}} & \mathrm{\mathbf{Pattern}} \\ \hline 2 & 0 & 2(0)\\ 4 & 4 & 4(1)\\ 8 & 16 & 8(2)\\ 16 & 48 & 16(3)\\ 32 & 128 & 32(4)\\ 64 & 320 & 64(5)\\ 128 & 768 & 128(6)\\ \hline \end{array} $$ where $s_a = a\left[\log_2(a)-1\right]$. But is this true in general? Turns out it is true and we can prove it with induction rather quickly. For simpler algebra, we'll define $n$ such that $2^n = a$, or rather, the $n$^th tile has a face value of $2^n$. Further, we define $t_n$ to be equal to $s_{2^n}$ and see that our closed form guess is $t_n = (n-1)2^n$ and our recurrence relation is $t_n = 2^n + 2t_{n-1}$.

The base case of $n=1$ is trivially true. We want to show that if $t_n = (n-1)2^n$ then $t_{n+1} = n2^{n+1}$. Consider $t_{n+1}$: $$ t_{n+1} = 2^{n+1} + 2t_n = 2^{n+1} + 2(n-1)2^n = 2^{n+1} + (n-1)2^{n+1} = n2^{n+1} $$ Pretty straight forward! So, at the end of your games you don't have to waste your time and look at the top corner of the screen for your score. You can simply find $s_a$ for all 16 tiles on the screen and add them together! We will denote $S_e$ as the total estimate score for the game: $$S_e = \sum_{16\;Tiles}s_a$$ But, you'll notice that your estimated scores are always higher than the score the game shows (but only by around $2\%$ to $6\%$ based on how high your score is). This is because we assumed we get the points for the $4$ tiles. So how can we correct for the fact that sometimes a $4$ spawns? The answer is in Part 2.

Saturday, May 3, 2014

Converging Sequence of Unusually Summed Fractions

While reading Measurement by Paul Lockhart, I came across his explanation for the length of the diagonal in the unit square. He first takes some "guesses" for the number $d$ that squares to $2$: $$\frac{3}{2}, \frac{7}{5}, \frac{10}{7}, \frac{17}{12}$$ Of course these are incorrect, but the sequence intrigued me. I was wondering why he picked these particular guesses. I noticed a pattern; the numerator of each term is the sum of the numerators of the previous two terms and the denominator of each term is the sum of the denominators of the previous two terms. I thought that maybe if we followed the same pattern we may converge to $\sqrt{2}$. $$ \begin{array}{|c|c|} \hline \mathrm{\mathbf{Fraction}} & \mathrm{\mathbf{Approx.}} \\ \hline 3/2 & 1.50000\\ 7/5 & 1.40000\\ 10/7 & 1.42857\\ 17/12 & 1.41667\\ 27/19 & 1.42105\\ 44/31 & 1.41935\\ 71/50 & 1.42000\\ 115/81 & 1.41975\\ 186/131 & 1.41985\\ 301/212 & 1.41981\\ 487/343 & 1.41983\\ 788/555 & 1.41982\\ \hline \end{array} $$ Clearly, this is not approaching $\sqrt{2}$. But it does appear to be converging to something close to $1.41982$. What exactly is going on here?

Let's generalize the problem a little bit more: Define the sequences $\{a_n\}$ and $\{b_n\}$ such that $a_n = a_{n-1} + a_{n-2}$ and $b_n = b_{n-1} + b_{n-2}$, with $a_1 \geq a_0 > 0$ and $b_1 \geq b_0>0$ given. Define another sequence $\{c_n\}$ such that $c_n = a_n/b_n$.

Some questions of particular interest include: Does $\{c_n\}$ always converge? If it converges, what is $\lim_{n\to\infty}c_n$? Can we find a closed form solution for $c_n$?

The first thing I noticed is that $c_n$ is between $c_{n-1}$ and $c_{n-2}$. To show this, assume, without loss of generality, that $c_{n-2}\leq c_{n-1}$. This means that $$ \frac{a_{n-2}}{b_{n-2}} \leq \frac{a_{n-1}}{b_{n-1}} \implies a_{n-2}b_{n-1} \leq a_{n-1}b_{n-2} $$ Consider $a_{n-2}(b_{n-2}+b_{n-1})$. $$ \begin{array}{r l} a_{n-2}(b_{n-2}+b_{n-1}) & = a_{n-2}b_{n-2}+a_{n-2}b_{n-1} \\ & \leq a_{n-2}b_{n-2} + b_{n-2}a_{n-1} \\ & = b_{n-2}(a_{n-2}+a_{n-1}) \end{array}$$ And from here it's clear to see the following inequality $$\frac{a_{n-2}}{b_{n-2}} \leq \frac{a_{n-2}+a_{n-1}}{b_{n-2}+b_{n-2}} \implies c_{n-2} \leq c_{n}$$ Following similarly, we can look at $b_{n-1}(a_{n-2}+a_{n-1})$ to show $c_{n} \leq c_{n-1}$. Therefore we have $$ c_{n-2} \leq c_{n} \leq c_{n-1} $$ and we can conclude that $c_n$ is always between $c_{n-1}$ and $c_{n-2}$ (the other direction of "between" comes from the opposite assumption that $c_{n-2}\geq c_{n-1}$).

While an interesting result, this doesn't necessarily put us any closer to showing convergence. I attempted to bound the sequence between the subsequence of even indices (which is decreasing) and the subsequence of odd indices (which is increasing) and squeeze out a limit. I was able to prove the bound existed, but I was not able to prove convergence for either of the two subsequences. I may share some of this work in a later post, but this direction wasn't leading anywhere. So, I temporarily gave up on showing convergence and moved onto coming up with a closed form for $c_n$. And in searching for a closed form solution, I found the following proof for both a closed form solution and the convergence.

For notational purposes, I will index the Fibonacci sequence, $\{0,1,1,2,3,5,\ldots\}$, as follows: $$F_0 =0, \; F_1 = 1, \; F_2 = 1, \; F_3 = 2, \; F_4 = 3, \; F_5 = 5, \ldots$$ I will first show that $a_n = F_n a_1 + F_{n-1} a_0$ for all $n\geq 2$. Proceeding by induction, the base case of $n=2$ works out: $$a_2 = a_1 + a_0 = 1 \cdot a_1 + 1 \cdot a_0 = F_2 a_1 + F_1 a_0$$ Now, assume that $a_k = F_k a_1 + F_{k-1}a_0$ for all $2\leq k\leq n$. And we want to show that $a_{n+1} = F_{n+1}a_1 + F_n a_0$. $$ \begin{array}{rl} a_{n+1}& = a_n + a_{n-1} \\ &= \left(F_na_1 + F_{n-1}a_0\right) + \left(F_{n-1}a_1 + F_{n-2}a_0\right) \\ &= \left(F_n + F_{n-1}\right)a_1+\left(F_{n-1}+F_{n-2}\right)a_0 \\ &= F_{n+1}a_1 + F_n a_0 \end{array} $$ In the same manner we can show that $b_n = F_n b_1 + F_{n-1} b_0$. Based on this, we can show that $$c_n = \frac{a_n}{b_n} = \frac{F_n a_1 + F_{n-1} a_0}{F_n b_1 + F_{n-1} b_0}\cdot\frac{1/F_{n-1}}{1/F_{n-1}}=\frac{\left(F_n/F_{n-1}\right)a_1+a_0}{\left(F_n/F_{n-1}\right)b_1+b_0}$$ Based on the well known result that the ratio between adjacent Fibonacci numbers approaches the golden ratio, $\phi$, we can conclude that $$\lim_{n\to\infty}c_n = \frac{\phi a_1 + a_0}{\phi b_1 + b_0}$$ We have shown convergence!

Let's reconsider the original problem. Recall that $a_0 = 3, a_1 = 7, b_0 = 2, $ and $b_1 = 5$. Our convergence theorem shows that $$\lim_{n\to\infty}c_n = \frac{7\phi +3}{5\phi+2} \approx 1.41982$$ Exactly what we saw happening! One more interesting result that I found along the way is that $$\lim_{n\to\infty}\frac{a_n}{a_{n-1}}= \lim_{n\to\infty}\frac{b_n}{b_{n-1}}=\phi$$ Basically, any two positive starting numbers in a Fibonacci-esque sequence will always have an adjacent term ratio of $\phi$, just like the Fibonacci sequence.

Tuesday, April 15, 2014

Introduction

Mathjestic is just what it sounds like: a (poor) combination of the words "math" and "majestic". It was created as a way for me to write-up and share some of the more intriguing math, programming, and engineering topics, problem, or inquiries that I come across in my academic pursuit.

My undergraduate background is in mechanical engineering and mathematics, but I have a great deal of experience in programming, as well. Topics will tend to focus around these three subject areas, but if I find content worthy (and that I'm knowledgeable enough about) then I will add them as well.