# Advent of Code–Decompression Length

- Posted in:
- F#
- Advent of Code

Today’s Advent of Code challenge was decompressing a sort-of run length encoded string, for which a recursive solution was the obvious choice.

The puzzle itself only asked for the length of the decompressed string to be calculated, so it was quite possible and permissible to solve the challenge without actually decompressing the string at all. I decided that I wanted my solution to be able to decompress the string, even though technically it wasn’t necessary.

So I created a recursive function that works its way through the string decoding as it goes, and yielding small strings which can be concatenated to produce the decompressed result. The benefits are that I can use this to check the decoding of test inputs, and that the entire decoded string doesn’t need to be stored in memory – we could sum the lengths of each element in the sequence as they are produced. It uses the convenient `yield!`

operator to yield every element from a sequence:

let input = System.IO.File.ReadAllText (__SOURCE_DIRECTORY__ + "\\input.txt") open System.Text.RegularExpressions let rec expand isV2 parsePos (input:string) = seq { let sub = input.Substring(parsePos) let m = Regex.Match(sub, @"\((\d+)x(\d+)\)") if m.Success then let chars = int m.Groups.[1].Value let repeats = int m.Groups.[2].Value let repeat = input.Substring(parsePos + m.Index + m.Length, chars) yield sub.Substring(0,m.Index) if isV2 then for i in 1..repeats do yield! expand isV2 0 repeat else yield [for i in 1..repeats -> repeat] |> List.fold (+) "" yield! expand isV2 (parsePos+m.Index+m.Length+chars) input else yield sub } let decompressV1 input = expand false 0 input |> System.String.Concat decompressV1 "ADVENT" |> printfn "%s" // ADVENT decompressV1 "A(1x5)BC" |> printfn "%s" // ABBBBBC decompressV1 "(3x3)XYZ" |> printfn "%s" // XYZXYZXYZ decompressV1 "(6x1)(1x3)A" |> printfn "%s" // (1x3)A decompressV1 "X(8x2)(3x3)ABCY" |> printfn "%s" // X(3x3)ABC(3x3)ABCY (decompressV1 input).Length |> printfn "Part a: %d"

So this approach works fine and can be used to solve part b as well,

let decompressV2 input = expand true 0 input |> Seq.sumBy (fun f -> int64 f.Length) decompressV2 "(27x12)(20x12)(13x14)(7x10)(1x12)A" |> printfn "Test: %d" // 241920 decompressV2 "(25x3)(3x3)ABC(2x3)XY(5x2)PQRSTX(18x9)(3x2)TWO(5x7)SEVEN" |> printfn "Test: %d" // 445 decompressV2 input |> printfn "Part b: %d"

But this is very slow (20 minutes on my machine) since the decompressed string is extremely long, meaning we have to iterate through a very long and repetitive sequence of expansions. A more sane way is to write an algorithm just keeping track of the length of the decoded string, which greatly reduces the work required since we can just multiply the length of an expanded substring by its number of repetitions.

So here’s the solution I used for part b (which was actually what I created first):

let rec decompressLen (input:string) = let rec expand (parsePos, decodedLen) = let m = Regex.Match(input.Substring(parsePos), @"\((\d+)x(\d+)\)") if m.Success then let chars = int m.Groups.[1].Value let repeats = int64 m.Groups.[2].Value let repeat = input.Substring(parsePos + m.Index + m.Length, chars) let expandedLen = (decompressLen repeat) * repeats expand (parsePos + m.Index + m.Length + chars, decodedLen + int64 m.Index + expandedLen) else decodedLen + int64(input.Length-parsePos) expand (0,0L) decompressLen "(27x12)(20x12)(13x14)(7x10)(1x12)A" |> printfn "Test: %d" // 241920 decompressLen "(25x3)(3x3)ABC(2x3)XY(5x2)PQRSTX(18x9)(3x2)TWO(5x7)SEVEN" |> printfn "Test: %d" // 445 decompressLen input |> printfn "Part b: %d"

These two solutions show the classic trade-off between single and general purpose code. The optimised solution runs far quicker to get the length of decompressed data, but if the part b question had been “what is the 10 millionth character in the decompressed string”, our general purpose function would have been able to give us the answer very easily whilst the optimised solution would be of no use.

So in the real world, choosing between two implementation options requires that we have some idea of what the future requirements might be, and that we have an idea of what sort of input we might expect (e.g. do we need to cope with very large input data?). Usually we end up guessing at both.