Knight in n, part 4: tensors

Previously in this series:

Welcome to the fourth installement of the Knight in n series. In part 3 we talked about the direct product of rings, and how they helped us solve the knight moves problem. This time yet another type of product is going to help in decomposing the algorithm to allow faster parts to be put in.

The tensor product of rings

In part three I introduced the direct product on rings, which is nothing more than a pair of numbers. Confusingly this operation is also called direct sum. To illustrate this name, take the direct sum/product of Array i a with Array j b. For every index i (within the bounds of the first array) there is a value of type a, and for every index j there is a value of type b. Instead of a pair of arrays, this could also be implemented as a single array with the type Array (Either i j) (Either a b). "Either" is just Haskell's way of saying "disjoint union" or "sum type", hence "direct sum".

There is another product operation that we can perform on two rings: the tensor product. Dually to the direct sum, the tensor product of Array i a and Array j b has type Array (i,j) (a,b). The definition is very simple: the array contains all pairs where the first part comes from the first array, and the second part comes from the second array.

Slightly more generally, we can use any combining function. The general tensor product of two arrays can be implemented as:

tensorWith :: (Ix i, Ix j) => (a -> b -> c) -> Array i a -> Array j b -> Array (i,j) c
tensorWith f a b
    = array ((alo,blo),(ahi,bhi))
      [ ((i,j), f x y) | (i,x) <- assocs a, (j,y) <- assocs b ]
  where (alo,ahi) = bounds a
        (blo,bhi) = bounds b

Usually elements are multiplied:

(><) :: (Ix i, Ix j, Num a) => Array i a -> Array j a -> Array (i,j) a
(><) = tensorWith (*)

The mathematical notation for this (><) operator is ⊗. Now an example: Here we take two 4-element vectors, their tensor product has 4*4=16 elements. The two vectors are "one dimensional*" objects, their tensor product is a "two dimensional" matrix.

A special case we will use often is the tensor product of an array with itself:

square x = x >< x

For example (using simple reflection of expressions which is now on hackage as Debug.SimpleReflect):

Knight4> square (listArray (0,2) [u,v,w])
listArray ((0,0),(2,2)) [u*u, u*v, u*w
                        ,v*u, v*v, v*w
                        ,w*u, w*v, w*w]

Interchange law

The tensor product and convolution operations satisfy the very useful interchange law:

!!!style="margin-top:.1em"!!!And since exponentiation is repeated convolution, also

For a proof sketch of this equation, compare the definitions of (><) and mulArray. Ignoring array bounds stuff, we have:

convolution:     [ ( i+j,  x*y) | (i,x) <- assocs a, (j,y) <- assocs b ]
tensor product:  [ ((i,j), x*y) | (i,x) <- assocs a, (j,y) <- assocs b ]

The only difference is in what happens to indices, with convolution the indices are added, with the tensor product a pair is formed. Now consider the interchange law. Informally, the indices of the left hand side are of the form (ia,ib)+(ic,id), and on the right hand side (ia+ic,ib+id). This corresponds exactly to the piecewise addition for Num (α,β).

The interchange law is often exploited to perform faster convolutions. For example, consider blurring an image by taking the convolution with a Gaussian blur kernel:
Performing this convolution requires O(n4) operations for an n by n image.

The two dimensional Gaussian blur kernel can be written as the tensor product of two one dimensional kernels, with a bit algebra this gives:

So now to blur an image we can perform two convolution, first with the horizontal kernel, and then with the vertical one:
This procedure needs only O(n3) operations.

Back to business

Blurring images is not what we are trying to do. Instead of convolution with the Gaussian blur kernel, we are interested in convolution with moveMatrix. We could try the same trick, finding an a such that moveMatrix == a >< a. Unfortunately, this is impossible.

But we can still get close, there is a way to write moveMatrix == square a + square b, well, almost. Actually, what we have is:

2 * moveMatrix
      0 2 0 2 0        1 1 0 1 1      1 -1  0 -1  1 
      2 0 0 0 2        1 1 0 1 1     -1  1  0  1 -1 
 ==   0 0 0 0 0   ==   0 0 0 0 0  -   0  0  0  0  0   ==   square a - square b
      2 0 0 0 2        1 1 0 1 1     -1  1  0  1 -1 
      0 2 0 2 0        1 1 0 1 1      1 -1  0 -1  1 


a,b :: Array Int Integer
a = listArray (-2,2) [1,1,0,1,1]
b = listArray (-2,2) [1,-1,0,-1,1]

Now we can start with pathsconv from last time:

pathsconv n ij = (moveMatrix ^ n) `safeAt` ij

Where safeAt is a safe array indexing operator, that returns 0 for indices that are out of bounds:

safeAt ar i
    | inRange (bounds ar) i = ar ! i
    | otherwise             = 0

Now let's do some algebraic manipulation:

    pathsconv n ij
= {- by definition of pathsconv -}
    (moveMatrix ^ n) `safeAt` ij
= {- by defintion of a and b -}
    ((square a - square b) `div` 2)^n `safeAt` ij -- division by 2 is pseudocode
= {- division does not depend on the index -}
    (square a - square b)^n `safeAt` ij `div` 2^n

We still cannot apply the interchange law, because the exponentiation (^n) is applied to the difference of two tensor products and not a single one. We can, however, expand this exponentation by the formula:

(a + b)^n = sum [ multinomial [na,nb] * a^na * b^nb | (na,nb) <- split n ]

This is just the usual binomial expansion, as in

Applying binomial expansion to our work-in-progress gives:

    (square a - square b)^n `safeAt` ij `div` 2^n
= {- binomial expansion -}
    sum [ multinomial [na,nb] * square a^na * (-square b)^nb
        | (na,nb) <- split n ]
    `safeAt` ij `div` 2^n
= {- (-square b)^nb == (-1)^nb * square b^nb -}
    sum [ multinomial [na,nb] * (-1)^nb
        * square a^na * square b^nb
        | (na,nb) <- split n ]
    `safeAt` ij `div` 2^n
= {- interchange law -}
    sum [ multinomial [na,nb] * (-1)^nb
        * square (a^na * b^nb)
        | (na,nb) <- split n ]
    `safeAt` ij `div` 2^n
= {- move `safeAt` inwards, since addition is pointwise -}
    sum [ multinomial [na,nb] * (-1)^nb
        * square (a^na * b^nb) `safeAt` ij
        | (na,nb) <- split n ]
    `div` 2^n

Fast indexing

Since square something already has n2 elements and the loop is performed n+1 times, this algorithm still requires O(n3) operations.

The only reason for calculating square (a^na * b^nb) is because we need the element at index ij. So instead of constructing a whole array, let's just calculate that single element:

-- square x `safeAt` ij  ==  x `squareAt` ij
x `squareAt` (i,j) = x `safeAt` i * x `safeAt` j

So the inner part of the algorithm becomes:

    square (a^na * b^nb) `safeAt` ij
= {- property of squareAt -}
    (a^na * b^nb) `squareAt` ij

We are still not there yet. Both a^na and b^nb have O(n) elements, so just calculating their convolution takes O(n2) work. But again, we need only two elements of the convolution, so we can define:

-- a * b `safeAt` i  ==  mulArrayAt a b i
mulArrayAt a b n = sum [ x * b `safeAt` (n-i) | (i,x) <- assocs a ]

And update squareAt accordingly:

mulSquareAt a b (i,j) = mulArrayAt a b i * mulArrayAt a b j

Finally we need a more efficient way to calculate all the powers of a and b. The iterate function can help us with that:

Knight4> iterate (*u) 1
[1, 1*u, 1*u*u, 1*u*u*u, 1*u*u*u*u, ...

Putting the pieces together gives a O(n2) algorithm for the knight moves problem:

pathstensor n ij
      = sum [ multinomial [na,nb] * (-1)^nb
            * mulSquareAt (powers_of_a !! na) (powers_of_b !! nb) ij
            | (na,nb) <- split n
       `div` 2^n
  where powers_of_a = iterate (*a) 1
        powers_of_b = iterate (*b) 1

Note that the savings we have made do not come directly from decomposing the moveMatrix. It is just that this decomposition allows us to see that we are computing all elements of am expensive product where a single one would do.

This post brings another order of improvement. Do you think you can do better than O(n2) time and O(n2) space complexity? If so I would like to hear.

*: The number of elements is often called the dimension of a vector. Here we use the term dimension to refer to the number of indices used, also known as the (tensor) order. So a 100*100 pixel image has dimension 10000 according to the first interpretation (the number of elements), but dimension two in the second interpretation (the number of indices).



Thanks for this beautiful insight. By translating this into PL J , I decided to replace the array-based polynomial approach into a list-based one using Henning Thielemanns poly. functions. This step gives a nice time and improvement.

scale s = map (s*)
add [] ys = ys add xs [] = xs add (x:xs) (y:ys) = x+y : add xs ys
mul [] _ = [] mul xs ys = foldr ((.(0:)).add.(flip scale xs)) [] ys
mulsq xs ys (i,j) = p1*p2 where ny = length ys hy = ny `div` 2 hx = (length xs)`div`2 ax = [-hx,-(pred hx) .. hx] xi = map ((hy+i)-) ax xj = map ((hy+j)-) ax ysi = [0..pred ny] p1 = sum $ zipWith (*) xs (map (\i -> if i`elem`ysi then ys!!i else 0) xi) p2 = sum $ zipWith (*) xs (map (\i -> if i`elem`ysi then ys!!i else 0) xj)
pascal = iterate (ap (zipWith (+) . (0 :)) (++ [0])) [1] altern n = take (n+1). drop (n `mod` 2) $ cycle [1,-1] mulnoms n = zipWith (*) (altern n) (pascal!!n)
nokps n (i,j) = sum ( zipWith (*) (mulnoms n) (zipWith (flip flip (i,j). mulsq) (take (n+1)$ pa) (reverse. take (n+1)$ pb)) ) `div` 2^n where pa = iterate (mul [1,1,0,1,1]) [1] pb = map (zipWith (*) (cycle[1,-1])) pa</pre>


(optional, will not be revealed)
Name of the lazy functional programming language I write about:
Use > code for code blocks, @code@ for inline code. Some html is also allowed.