## Project Euler Problem #14

### January 29, 2010

Read the details of the problem here

Summary

Find the longest sequence using a starting number under one million.

Solution

This one was hard. Not so much in working out an algorithm, which is just pretty much as it says in the question, but simply in getting Groovy to execute it in under 15 seconds.

Here is my first realistic attempt; I put in a cache to store partial results, initially I had tried a dictionary/map and then a List but the performance was lousy so I reverted to using a simple array. I also tinkered with the CACHE_SIZE and found that it was pretty much optimal to have the size the same as the maximum value of i as this saved having another range validation when updating the cache.

```def ( maxlen, answer ) = [ 0, 0 ]

def CACHE_SIZE = 1000000
def cache = new int[CACHE_SIZE]

for (i in 1..<1000000) {
long n = i
def len = 1
while (n > 1) {
if ((n < CACHE_SIZE) && (cache[(int)n] != 0)) {
len += cache[(int)n] - 1
break
}
n = (n % 2 == 0) ? n / 2 : (3 * n + 1)
len++
}
cache[i] = len
if (len > maxlen) ( maxlen, answer ) = [ len, i ]
}
```

This ran in 102.6 seconds. (The ArrayList version of the cache, defined as [0] * CACHE_SIZE had been a lot slower than the native array, taking 144 seconds.) So nowhere near the 15 second target. A bit of profiling showed that the single calculation line was taking 81% of the runtime. Talk about a hotspot!

I couldn’t see how the algorithm could be significantly changed so some micro-optimisation was in order. I’d done low-level performance programming when I used to be interested in graphics programming back in the days of 66 MHz 80486 CPUs but was hoping that today’s high-level Java compilers would remove some of that burden.

My first change was to make the maths (n % 2 == 0) statement into a logical one. I was only interested in the low-bit so I could replace it with ((n & 1) == 0). This brought the runtime down to 97.1 seconds – a saving of 5% – so not too impressive.

Next I changed the (n / 2) statement into n.intdiv(2). As n is defined as a long (and has to be as it exceeds 32-bits for some values of i) and 2 is a constant integer I figured that Groovy should be forcing integer division anyway, but I’d put it in explicitly in any case. This made a HUGE difference, bringing the runtime down to 20.3 seconds. A reduction of 79%!

So, if division was so expensive, could the same be said for multiplication? I next changed the (3 * n + 1) statement to (n + n + n + 1). This got the execution time down to 19.7 seconds – another 3% shaved off. Not much but I’ll take what I can get.

I went back to the n.intdiv(2) statement. One of the tricks when writing in assembler/C is to use a right- or left-shift to divide or multiply by two though this can have implications on the sign-bit. I thought I’d give it a go and changed n.intdiv(2) to (n >> 1). This dropped the time down to 19.3 seconds – barely a 2% saving but worth having. (I did also try the unsigned right-shift operator “>>>” but this had no material effect either way.)

I stopped and had a bit of a think about the problem. I’d reduced the maths overhead as low as I could. It was now only 33% of the runtime. How could I do less of the other 67%? It was only possible to arrive at the termination condition from an even number, a “2” (obviously). The formula used meant that any odd value of n would be changed to a non-terminal even number which would then be immediately divided by 2 on the next pass. So, could these steps be combined and reduce the runtime of the inner while-loop?

The answer was, of course, yes. With the inner-loop replaced with the code below it runs in 15.2 seconds – some 21% faster than before. Still fractionally outside my 15 second cut-off but 85% faster than my initial attempt! The Java equivalent runs in 196 ms – 77x faster – and, for what it matters, a Delphi 6 version takes a mere 117 ms.

```if ((n & 1) == 0) {
n >>= 1
len++
}
else {
n = (n + n + n + 1) >> 1
len += 2
}
```

How about multithreading the solution? I’ve got 2 cores on my CPU – so let’s use them. Groovy makes this very easy as Closures are Runnables. I took some liberties with thread safety around the cache which the threads share for efficiency and the division of labour between threads was fairly arbitrary. I could have used a queue with the threads pulling work off as they became free, but as this was a hack anyway, these conditions sufficed for my purpose.

Here’s my final offering:

```def solve14(start, end, cache, results) {

def ( maxlen, answer ) = [ 0, 0 ]
def cache_size = cache.size()

for (i in start..<end) {
long n = i
def len = 1
while (n > 1) {
if ((n < cache_size) && (cache[(int)n] != 0)) {
len += cache[(int)n] - 1
break
}
if ((n & 1) == 0) {
n >>= 1
len++
}
else {
n = (n + n + n + 1) >> 1
len += 2
}
}
cache[i] = len
if (len > maxlen) ( maxlen, answer ) = [ len, i ]
}
}

def results = [:] as Hashtable
def CACHE_SIZE = 1000000
def cache = new int[CACHE_SIZE]

new Thread() { solve14(670000, 1000000, cache, results) },
new Thread() { solve14(0, 670000, cache, results) }
]

}

}

```

This completes in 10.7 seconds – quite comfortably inside my, admittedly somewhat arbitrary, limit of 15 seconds! It’s marginally faster (by 1 second) to start the upper bound thread first. Although I’ve cheated by using multiple threads to get below the cut-off, it’s not a bad result given how poor initial performance was.

Conclusion

If calculation performance really is needed, then steer clear of Groovy. If you do stick to using Groovy there are a few manual tricks you can use to make it go faster as Groovy doesn’t do it for you!

I’m not going to muck about like this again for computationally intensive solutions. If Java will do the job within the timeframe then I’ll use it either in whole, or in part. After all, what’s the point of having tight Java integration if you don’t use it from time-to-time?