## Fenwick trees demystified

A Fenwick tree is a clever way to represent a list of numbers in an array, which allows arbitrary-length prefix sums to be calculated efficiently. (For example, the list [1,2,3,4,5] has a length-3 prefix [1,2,3] with sum 1+2+3 = 6.) This is useful in various scenarios, most notably to implement arithmetic coding, which requires dynamic tracking of cumulative frequencies of previously encountered symbols.

The data structure was introduced by Peter Fenwick in 1994, in a report titled “A new data structure for cumulative frequency tables”. Fenwick called it a Binary Indexed Tree, after the observation that the binary representation of indices determines the implicit tree structure, but the term Fenwick tree seems to be more popular today. Many articles are already available online that explain how a Fenwick tree may be implemented. Unfortunately, these articles invariably fail to explain how it was derived.

Fenwick himself shares the responsibility for the confusion, since he did not bother to discuss the history of the data structure in his publication. This has lead readers to believe there is something magical about the particular layout that is used, and caused programmers to blindly copy the source code from Fenwick's report or another source, instead of trying to understand the underlying principle first and then deriving the necessary code themselves. That's a pity, since proper understanding of a solution is necessary to extend, adapt or reconstruct it.

In this article I will try to fill this gap in public knowledge by explaining how the Fenwick tree structure and the algorithms that operate on it can be derived from scratch.

Read more »

The data structure was introduced by Peter Fenwick in 1994, in a report titled “A new data structure for cumulative frequency tables”. Fenwick called it a Binary Indexed Tree, after the observation that the binary representation of indices determines the implicit tree structure, but the term Fenwick tree seems to be more popular today. Many articles are already available online that explain how a Fenwick tree may be implemented. Unfortunately, these articles invariably fail to explain how it was derived.

Fenwick himself shares the responsibility for the confusion, since he did not bother to discuss the history of the data structure in his publication. This has lead readers to believe there is something magical about the particular layout that is used, and caused programmers to blindly copy the source code from Fenwick's report or another source, instead of trying to understand the underlying principle first and then deriving the necessary code themselves. That's a pity, since proper understanding of a solution is necessary to extend, adapt or reconstruct it.

In this article I will try to fill this gap in public knowledge by explaining how the Fenwick tree structure and the algorithms that operate on it can be derived from scratch.

Read more »

## Cyber Crime Challenge 0xFFD

Afgelopen maand werd de Cybercrime Challenge gehouden: een digitale speurtocht georganiseerd door het Team High Tech Crime van de Nederlandse politie, in samenwerking met Tweakers.net en Certified Secure. In deze weblogpost geef ik de antwoorden van de challenge en de methode die ik gebruikt heb om die antwoorden te vinden.

Wie het leuk vindt om de challenge zelf te doen, moet vooral niet verder lezen!

Lees verder »

Wie het leuk vindt om de challenge zelf te doen, moet vooral niet verder lezen!

Lees verder »

## High-Color GIF Images

In the nineties the prevalent image format used on the world wide web was the GIF file format (full spec), which is reasonably efficient, widely portable, and supports animation and transparency. Mainly inspired by concerns over patent claims by Unisys on the LZW compression algorithm used to create GIF files, proponents of open web standards designed the PNG file format (based on the DEFLATE compression algorithm, which is believed to be unencumbered by patents) as a replacement. PNG does have several advantages over GIF, but some of the PNG-advocacy from the time was misleading.

One common reason suggested to prefer PNG over GIF, was that GIF files are limited to using 256-color palettes, and therefore unsuitable for rendering color images faithfully. It turns out this isn't quite true! GIF files can, in fact, contain an arbitrary number of colors, and in this post I will show how such high-color GIF images can be constructed that are quite suitable for web use.

Since a single image can display multiple graphic rendering blocks at once (either by rendering separate blocks into separate regions of the canvas, or by using transparency to overlap blocks without overwriting all previous pixels) the final image can contain an arbitrary number of different colors.

An easy approach to generating high-color GIF images is to partition the set of colors used over multiple frames which are then rendered on top of each other. The first frame contains the pixels with the 256 most-commonly used colors, while each next frame adds 255 new colors (reserving one palette index for transparency).

In the table above, the image on the left is a full-color (and rather colorful) PNG image. The middle image is a GIF file encoded using the technique described above. Some problems with the approach are apparent. The first is that modern browsers insert an artificial rendering delay between frames (refresh the page if you missed it while reading the introduction!) As a consequence, it takes a while for the picture to take shape, because each frame only contains a small fraction of the total image. And finally, because there are so many frames, the resulting file is rather large!

The awkwardness of incremental rendering can be ameliorated by encoding more image information into the first few frames. Instead of rendering only the pixels with exactly matching colors on each frame, we can approximate the other pixels with the nearest available color too. This means the first frame already contains an approximation of the full image, and that subsequent frames improve on (part of) the image by rendering additional details. Besides looking better when rendered incrementally, this has the advantage that GIF renderers that fail to render all frames will still display a reasonable approximation of the final image. The image on the right in the table above is the result of approximating the first few frames this way.

The table below shows the result of this approximate encoding of frames. The first row contains the first few frames of the GIF file. Note that the first image contains an approximation of all the pixels, while subsequent frames use transparency on pixels that cannot be better approximated using the frame-local palette. The second row shows the composite of the frames rendered so far, while the third row shows the symmetric difference with the final image (which should fade to black eventually).

This approximation technique, unfortunately, creates somewhat larger files, because frames tend to contain fewer transparent pixels and therefore aren't compressed as much.

However, the more important cause of the large file size of these GIF files, is that they span a lot of frames (143 for just this small image!) despite the fact (as the image above shows) that after a few frames, the end result is already approximated relatively well. It seems dubious whether the last hundred frames or so really add much information to the image, even though they are required to reproduce all pixels exactly.

In a GIF file ony pixel data is compressed while color tables are not. Therefore, every additional color requires at least three more bytes in the file (one byte for each component: red, green and blue) and using a large number of distinct colors isn't particularly efficient.

From a practical point of view, it makes more sense to reduce the number of different colors in the image. For example, if we want to use just 10 frames, we can still render 2551 different colors, which is certainly a big improvement over the 256 color palette GIF files typically use. This seems like a good compromise between image quality and file size.

To show how well this works, here is a 2551-color high-resolution version of the image shown before:

Yes, that is really a GIF image you're looking at! The table below summarizes the results for various image formats. Note that the simple 10-frame GIF image is actually smaller than the PNG image after quantization.

Since I couldn't find a good color quantization tool that works with more than 256 colors, I implemented my own using K-means clustering:

One common reason suggested to prefer PNG over GIF, was that GIF files are limited to using 256-color palettes, and therefore unsuitable for rendering color images faithfully. It turns out this isn't quite true! GIF files can, in fact, contain an arbitrary number of colors, and in this post I will show how such high-color GIF images can be constructed that are quite suitable for web use.

#### Technical details

Althought it's true that GIF files are limited to an 8-bit pixel format (each pixel value indexing a table of at most 256 colors) a single GIF image may contain multiple graphic rendering blocks, each of which may include its own local color table of 256 colors drawn from a 24-bit color space.Since a single image can display multiple graphic rendering blocks at once (either by rendering separate blocks into separate regions of the canvas, or by using transparency to overlap blocks without overwriting all previous pixels) the final image can contain an arbitrary number of different colors.

An easy approach to generating high-color GIF images is to partition the set of colors used over multiple frames which are then rendered on top of each other. The first frame contains the pixels with the 256 most-commonly used colors, while each next frame adds 255 new colors (reserving one palette index for transparency).

170x240 pixels, 36330 colors, 24-bit PNG, 88 KB | 170x240 pixels, 36330 colors, 143 frames GIF, 242 KB | 170x240 pixels, 36330 colors, 143 frames GIF, 337 KB |

In the table above, the image on the left is a full-color (and rather colorful) PNG image. The middle image is a GIF file encoded using the technique described above. Some problems with the approach are apparent. The first is that modern browsers insert an artificial rendering delay between frames (refresh the page if you missed it while reading the introduction!) As a consequence, it takes a while for the picture to take shape, because each frame only contains a small fraction of the total image. And finally, because there are so many frames, the resulting file is rather large!

The awkwardness of incremental rendering can be ameliorated by encoding more image information into the first few frames. Instead of rendering only the pixels with exactly matching colors on each frame, we can approximate the other pixels with the nearest available color too. This means the first frame already contains an approximation of the full image, and that subsequent frames improve on (part of) the image by rendering additional details. Besides looking better when rendered incrementally, this has the advantage that GIF renderers that fail to render all frames will still display a reasonable approximation of the final image. The image on the right in the table above is the result of approximating the first few frames this way.

The table below shows the result of this approximate encoding of frames. The first row contains the first few frames of the GIF file. Note that the first image contains an approximation of all the pixels, while subsequent frames use transparency on pixels that cannot be better approximated using the frame-local palette. The second row shows the composite of the frames rendered so far, while the third row shows the symmetric difference with the final image (which should fade to black eventually).

This approximation technique, unfortunately, creates somewhat larger files, because frames tend to contain fewer transparent pixels and therefore aren't compressed as much.

However, the more important cause of the large file size of these GIF files, is that they span a lot of frames (143 for just this small image!) despite the fact (as the image above shows) that after a few frames, the end result is already approximated relatively well. It seems dubious whether the last hundred frames or so really add much information to the image, even though they are required to reproduce all pixels exactly.

In a GIF file ony pixel data is compressed while color tables are not. Therefore, every additional color requires at least three more bytes in the file (one byte for each component: red, green and blue) and using a large number of distinct colors isn't particularly efficient.

From a practical point of view, it makes more sense to reduce the number of different colors in the image. For example, if we want to use just 10 frames, we can still render 2551 different colors, which is certainly a big improvement over the 256 color palette GIF files typically use. This seems like a good compromise between image quality and file size.

To show how well this works, here is a 2551-color high-resolution version of the image shown before:

Yes, that is really a GIF image you're looking at! The table below summarizes the results for various image formats. Note that the simple 10-frame GIF image is actually smaller than the PNG image after quantization.

Full image | Full image | Full image | Full image | Full image |

679x960 pixels, 436186 colors, 24-bit PNG, 1235 KB | 679x960 pixels, 2551 colors, 24-bit PNG, 1194 KB | 679x960 pixels, 2551 colors, 10-frame GIF, 1109 KB | 679x960 pixels, 2551 colors, 10-frame GIF, 1483 KB | 679x960 pixels, 256 colors, 1-frame GIF, 415 KB |

#### Conclusion

Although the PNG file format is arguably superior to GIF in many ways, it is possible to render colorful images from GIF files. Even so, it makes sense to reduce the color palette somewhat to avoid creating excessively large files. One practical problem with this approach is web browsers' failure to render multiple blocks without delay; this is arguably a bug that ought to be fixed.#### Tools used

In case you want to perform similar experiments, I will list the tools I've used to create the images in this post (mostly written in Python). Feel free to use them however you like.- high-color-gif-simple.py, for the straightforward/smaller file encoding.
- high-color-gif-fancy.py: for the approximating/larger file encoding

Since I couldn't find a good color quantization tool that works with more than 256 colors, I implemented my own using K-means clustering:

- quantize-kmeans2.py is simple and works well, but rather slow (especially for large images and color palettes).
- quantize-kmeans.c is my own implementation of Lloyd's algorithm in C instead, which is what I used to create the 2551 color image above.

## Facebook Hacker Cup 2013: round 1 problem analysis

Round 1 of the Facebook Hacker Cup has just ended. Competitors were required to solve three nicely-balanced problems: all of them required substantial thought, without being unreasonably difficult. Like last week, I will describe my solutions to the problems, although this time the solution source code is written in C++.

The easiest problem in the contest is a combinatorial one. We are given an array A of N distinct integers, and are asked to calculate for each element A[i] how many subsets of A of size K exist, such that A[i] is the largest in the set. Then, we must multiply A[i] by the number of such sets, sum the results, and report the answer modulo 1,000,000,007.

On a high level, this isn't very difficult. If we start by sorting the array A, then we can simply compute the answer as:

We can implement this idea in at least three different ways, and the differences are mostly in how we implement the choice function. The most obvious way is to use mathematical definion:

If we want to limit ourselves to 64-bit integer arithmetic only, which is a lot faster, we should perform all calculations modulo 1,000,000,007. But in that case, the division is problematic, because the modulo operator doesn't distribute over division; i.e. generally (x / y) mod z ≠ (x mod z) / (y mod z).

There are two ways to overcome this problem. One is to note that 1,000,000,007 is a prime number, and thus we can calculate the multiplicative inverse of each factorial using e.g. the extended Euclidean algorithm, and multiply with the inverse instead of dividing. This yields on O(N log N) solution, which is pretty good considering that N ≤ 10,000.

During the contest I used a more brute-force approach, however, by simply pre-calculating all relevant binomial coefficients using the recurrence relation:

C++:

I suspect it is also possible to combine the summation with the calculation of the combinations, resulting in a simpler solution that avoids the above complications.

For the second problem, we must determine the lexicographically (read: alphabetically) smallest key string that is consistent with the output produced by the specified transformation, or output "IMPOSSIBLE" if no valid answer exists.

Practically speaking, we need to fill in the blanks in K1 in such a way that it is possible to rearrange its M parts into K2. The key to solving this problem is to realize that if we can determine whether an answer exists at all, we can use that same method to construct the lexicographically least possible answer by filling in the blanks in K1 from left to right, and from 'a' to 'z'.

We can model the problem as a bipartite graph where each vertex corresponds with the parts of K1 and K2, and an edge exists between two parts if the strings match (in the sense that "ab?" matches "?bc" because on every position all characters match). Then a solution exists if and only if the maximum matching in this graph contains exactly M edges.

For example, if K1="ab???c?ab" and K2="a?ccab?a?" and M=3 (so each key consists of three parts of three characters) then the corresponding graph looks like this:

There are lines connecting the parts that can be matched together, and the green edges indicate the maximum possible matching. Since all parts are matched, there is a solution for the given pair of keys.

I implemented a maximum matching algorithm based on augmenting paths. How efficient is this approach? We may need to determine if a pair of strings are compatible at most 6×N times (once for each possible blank in K1, multiplied by the number of possible letters: a-f).

For each such case, we must construct a bipartite graph which will have at most M×M edges and find upto M augmenting paths. Finding a single augmenting path requires going through all edges at most once, so this part of the algorithm takes O(M^3) time.

The total runtime of the algorithm is therefore O(6*N*M^3), or about 75 million steps in the worst case, which is acceptable.

C++:

For the third problem, I used a brute-force approach again. The constraints are such that it is barely feasible to iterate over all possible window placements, except that we must be able to determine if each placement is possible nearly instantly, or we'll run out of time.

The key to solving this problem is to first reduce it from two dimensions to one. Suppose the height of the window to be placed (Q) is equal to the height of the screen (H). Then, the Y-coordinate of dead pixels doesn't matter, and we are only interested in which columns are blocked by a dead pixel. If we iterate over the blocked columns from left to right, we can easily track the gaps between dead columns; if we find a gap of width G, then we can place a window in the gap if its width P ≤ G. In fact, there are exactly (G - P + 1) ways to place the window in the gap.

To extend this idea to two dimensions, we start by assuming the window is placed against the top of the screen. We can then count how many dead pixels occur in each column for the first Q rows of the screen (with 0-based indices 0 through Q-1), and scan the columns for gaps as described above, counting all valid placements where the top of the window is at Y-coordinate 0. If we move the window down one row (to Y-coordinate 1), then we should update the count of dead pixels per column by adding the dead pixels on row Q and subtracting the dead pixels on row 0 (which no overlap the window). We can repeat this process until we reach the bottom of the screen.

This solution requires Θ(W×H) time. Since W,H ≤ 40,000 that could take a whopping 1,600,000,000 operations per test case; that's a lot! Fortunately, each operation is very simple, so implemented in C++ this solution takes no more than a few seconds per testcase. I wouldn't want to try this in a scripting language, though!

C++:

#### Problem 1: Card Game (20 points)

Problem statement here.The easiest problem in the contest is a combinatorial one. We are given an array A of N distinct integers, and are asked to calculate for each element A[i] how many subsets of A of size K exist, such that A[i] is the largest in the set. Then, we must multiply A[i] by the number of such sets, sum the results, and report the answer modulo 1,000,000,007.

On a high level, this isn't very difficult. If we start by sorting the array A, then we can simply compute the answer as:

- sum A[i] × C(i, K - 1) mod 1,000,000,007 (for 0 ≤ i < N)

We can implement this idea in at least three different ways, and the differences are mostly in how we implement the choice function. The most obvious way is to use mathematical definion:

- C(n,k) = n! / k! / (n - k)!

If we want to limit ourselves to 64-bit integer arithmetic only, which is a lot faster, we should perform all calculations modulo 1,000,000,007. But in that case, the division is problematic, because the modulo operator doesn't distribute over division; i.e. generally (x / y) mod z ≠ (x mod z) / (y mod z).

There are two ways to overcome this problem. One is to note that 1,000,000,007 is a prime number, and thus we can calculate the multiplicative inverse of each factorial using e.g. the extended Euclidean algorithm, and multiply with the inverse instead of dividing. This yields on O(N log N) solution, which is pretty good considering that N ≤ 10,000.

During the contest I used a more brute-force approach, however, by simply pre-calculating all relevant binomial coefficients using the recurrence relation:

- C(n,0) = 1
- C(n,n) = 1
- C(n,k) = C(n-1,k-1) + C(n-1,k)

C++:

```
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
``` | #include <algorithm> #include <iostream> #include <vector> static const int MaxN = 10000, Mod = 1000000007; static int nCr[MaxN+1][MaxN+1]; int main() { // Precalculate binomial coefficients: for (int n = 0; n <= MaxN; ++n) { nCr[n][0] = nCr[n][n] = 1; for (int r = 1; r < n; ++r) nCr[n][r] = (nCr[n - 1][r] + nCr[n - 1][r - 1])%Mod; } int cases = 0; std::cin >> cases; for (int caseno = 1; caseno <= cases; ++caseno) { // Read input: int N = 0, K = 0; std::cin >> N >> K; std::vector<int> A(N); for (int i = 0; i < N; ++i) std::cin >> A[i]; // Calculate sum: std::sort(A.begin(), A.end()); int answer = 0; for (int i = K - 1; i < N; ++i) answer = (answer + (long long)nCr[i][K - 1]*A[i])%Mod; std::cout << "Case #" << caseno << ": " << answer << std::endl; } } |

I suspect it is also possible to combine the summation with the calculation of the combinations, resulting in a simpler solution that avoids the above complications.

#### Problem 2: Security (35 points)

Problem statement here.For the second problem, we must determine the lexicographically (read: alphabetically) smallest key string that is consistent with the output produced by the specified transformation, or output "IMPOSSIBLE" if no valid answer exists.

Practically speaking, we need to fill in the blanks in K1 in such a way that it is possible to rearrange its M parts into K2. The key to solving this problem is to realize that if we can determine whether an answer exists at all, we can use that same method to construct the lexicographically least possible answer by filling in the blanks in K1 from left to right, and from 'a' to 'z'.

We can model the problem as a bipartite graph where each vertex corresponds with the parts of K1 and K2, and an edge exists between two parts if the strings match (in the sense that "ab?" matches "?bc" because on every position all characters match). Then a solution exists if and only if the maximum matching in this graph contains exactly M edges.

For example, if K1="ab???c?ab" and K2="a?ccab?a?" and M=3 (so each key consists of three parts of three characters) then the corresponding graph looks like this:

There are lines connecting the parts that can be matched together, and the green edges indicate the maximum possible matching. Since all parts are matched, there is a solution for the given pair of keys.

I implemented a maximum matching algorithm based on augmenting paths. How efficient is this approach? We may need to determine if a pair of strings are compatible at most 6×N times (once for each possible blank in K1, multiplied by the number of possible letters: a-f).

For each such case, we must construct a bipartite graph which will have at most M×M edges and find upto M augmenting paths. Finding a single augmenting path requires going through all edges at most once, so this part of the algorithm takes O(M^3) time.

The total runtime of the algorithm is therefore O(6*N*M^3), or about 75 million steps in the worst case, which is acceptable.

C++:

```
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
``` | #include <iostream> #include <string> #include <vector> static std::vector<std::vector<int> > adj; static std::vector<int> matchL, matchR; static std::vector<char> visited; static bool match(int i) { if (visited[i]) return false; visited[i] = true; for (size_t n = 0; n < adj[i].size(); ++n) { const int j = adj[i][n]; if (matchR[j] < 0 || match(matchR[j])) { matchL[i] = j; matchR[j] = i; return true; } } return false; } static bool match_one() { for (int i = 0; i < adj.size(); ++i) { if (matchL[i] < 0 && match(i)) return true; } return false; } static int maximum_matching() { int res = 0; matchL.assign(adj.size(), -1); matchR.assign(adj.size(), -1); while (visited.assign(adj.size(), 0), match_one()) ++res; return res; } static bool compatible(const std::string &s, const std::string &t) { for (size_t i = 0; i < s.size(); ++i) { if (s[i] != '?' && t[i] != '?' && s[i] != t[i]) return false; } return true; } static bool possible(const std::string &K1, const std::string &K2, int M) { int L = K1.size() / M; adj.assign(M, std::vector<int>()); for (int i = 0; i < M; ++i) { for (int j = 0; j < M; ++j) { if (compatible(K1.substr(L*i, L), K2.substr(L*j, L))) adj[i].push_back(j); } } return maximum_matching() == M; } int main() { int cases = 0; std::cin >> cases; for (int caseno = 1; caseno <= cases; ++caseno) { int M; std::string K1, K2; std::cin >> M >> K1 >> K2; std::cout << "Case #" << caseno << ": "; if (!possible(K1, K2, M)) { std::cout << "IMPOSSIBLE" << std::endl;; continue; } for (std::string::iterator it = K1.begin(); it != K1.end(); ++it) { if (*it == '?') { *it = 'a'; while (!possible(K1, K2, M)) ++*it; } } std::cout << K1 << std::endl; } } |

#### Problem 3: Dead Pixels (45 points)

Problem statement here.For the third problem, I used a brute-force approach again. The constraints are such that it is barely feasible to iterate over all possible window placements, except that we must be able to determine if each placement is possible nearly instantly, or we'll run out of time.

The key to solving this problem is to first reduce it from two dimensions to one. Suppose the height of the window to be placed (Q) is equal to the height of the screen (H). Then, the Y-coordinate of dead pixels doesn't matter, and we are only interested in which columns are blocked by a dead pixel. If we iterate over the blocked columns from left to right, we can easily track the gaps between dead columns; if we find a gap of width G, then we can place a window in the gap if its width P ≤ G. In fact, there are exactly (G - P + 1) ways to place the window in the gap.

To extend this idea to two dimensions, we start by assuming the window is placed against the top of the screen. We can then count how many dead pixels occur in each column for the first Q rows of the screen (with 0-based indices 0 through Q-1), and scan the columns for gaps as described above, counting all valid placements where the top of the window is at Y-coordinate 0. If we move the window down one row (to Y-coordinate 1), then we should update the count of dead pixels per column by adding the dead pixels on row Q and subtracting the dead pixels on row 0 (which no overlap the window). We can repeat this process until we reach the bottom of the screen.

This solution requires Θ(W×H) time. Since W,H ≤ 40,000 that could take a whopping 1,600,000,000 operations per test case; that's a lot! Fortunately, each operation is very simple, so implemented in C++ this solution takes no more than a few seconds per testcase. I wouldn't want to try this in a scripting language, though!

C++:

```
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
``` | #include <iostream> #include <vector> int main() { int cases = 0; std::cin >> cases; for (int caseno = 1; caseno <= cases; ++caseno) { // Read input (that's a lot of parameters!) int W, H, P, Q, N, X, Y, a, b, c, d; std::cin >> W >> H >> P >> Q >> N >> X >> Y >> a >> b >> c >> d; // Generate a list of dead pixels (grouped by row) std::vector<std::vector<int> > dead(H); for (int x = X, y = Y, i = 0; i < N; ++i) { int nx = (x*a + y*b + 1)%W, ny = (x*c + y*d + 1)%H; dead[y].push_back(x); x = nx; y = ny; } // Scan row by row, tracking which columns are blocked by dead pixels: std::vector<int> blocked(W + 1); blocked[W] = 1; for (int y = 0; y < Q - 1; ++y) { for (std::vector<int>::iterator it = dead[y].begin(); it != dead[y].end(); ++it) ++blocked[*it]; } int answer = 0; for (int y = 0; y + Q <= H; ++y) { // Add bottom row for (std::vector<int>::iterator it = dead[y + Q - 1].begin(); it != dead[y + Q - 1].end(); ++it) ++blocked[*it]; // Count number of valid horizontal placements: int edge = P; for (int c = 0; c <= W; ++c) { if (blocked[c]) { if (edge <= c) answer += c - edge + 1; edge = c + 1 + P; } } // Remove top row for (std::vector<int>::iterator it = dead[y].begin(); it != dead[y].end(); ++it) --blocked[*it]; } std::cout << "Case #" << caseno << ": " << answer << std::endl; } } |

## Facebook Hacker Cup 2013: qualification round problem analysis

As in previous years, I will be competing in the Facebook Hacker Cup, and I will describe the solutions I come up with on this weblog, hoping that other programmers or fellow competitors find them interesting.

I try to balance brevity with rigor: pasting just my solution code would not be very informative, but detailed proofs get boring quickly. Aiming for a happy medium, I will describe my solution approach before presenting the corresponding source code, adding proof outlines where necessary and linking to Wikipedia for detailed explanations of well-known topics.

This post contains source code written in Python. Unfortunately, Tweakers.net persists in their failure to support syntax highlighting for this popular language, which is why you will see screen shots below (but don't worry: links to the raw source code are provided as well).

We are asked to maximize the total “beauty” of a string, calculated as the sum of the beauty of the letters in the string, by assigning optimal values to different letters. The intuitive approach is to greedily assign the highest value (26) to the most common letter, the next highest value (25) to the next most common letter, and so on. Before coding this up, let's try to prove that the intuition is correct.

Formally, if we call

This condition is necessary, because if

Now that we have proven the greedy approach to be correct, we can implement it in Python as follows:

If we ignore the smileys for a moment, the problem reduces to checking if all parentheses in the input are properly balanced. We can check this in linear time by scanning the string once (e.g. from left to right) and tracking the current nesting depth, which is increased for every opening parenthesis we encounter, and decreased for every closing parenthesis.

Using this approach, the string is well-formed if and only if:

But this string has an unmatched opening parenthesis, and thus violates rule 1:

And this string has an unmatched closing parethesis, which violates rule 2:

This approach works well with just parentheses, but the presence of smileys complicates matters, because we don't know in advance if we should count them as parentheses or not. Fortunately, we can adapt the above algorithm to deal with this uncertainty. Instead of tracking a single nesting depth value at each position, we should keep track of a set of integers representing all possible nesting depths.

Since this set will necessarily consist of consecutive integers, we can just store the minimum and maximum elements (knowing that all values in between are possible too). Again, we conclude that the string is well-formed if the lower-bound at the end is 0, and the upper bound never becomes negative (which would imply the set of possibilities is empty).

For example, this string is well-formed:

This idea can be implemented succinctly in Python:

Note that this solution asymptotically optimal: it requires linear time and constant space.

The final problem looks complicated, with all the parameters and formulas described in the problem statement, but we can approach it systematically by breaking it down into simpler subproblems.

First, the problem statement dictates that the input is generated using a pseudo-random linear congruential generator. This is only done to keep the size of the input files small, so we can generate the first

Although these first

This means that if we can compute the elements at indices

The first data structure counts how often each distinct value is

The second data structure is an ordered collection of integers (between

Note that the

Now consider how these data structures are updated when the window slides to the right. First, to determine

The implementation in Python looks like this:

Since heap operations on a list of size O(

I try to balance brevity with rigor: pasting just my solution code would not be very informative, but detailed proofs get boring quickly. Aiming for a happy medium, I will describe my solution approach before presenting the corresponding source code, adding proof outlines where necessary and linking to Wikipedia for detailed explanations of well-known topics.

This post contains source code written in Python. Unfortunately, Tweakers.net persists in their failure to support syntax highlighting for this popular language, which is why you will see screen shots below (but don't worry: links to the raw source code are provided as well).

#### Problem A: Beautiful Strings (20 points)

(Full problem statement here.)We are asked to maximize the total “beauty” of a string, calculated as the sum of the beauty of the letters in the string, by assigning optimal values to different letters. The intuitive approach is to greedily assign the highest value (26) to the most common letter, the next highest value (25) to the next most common letter, and so on. Before coding this up, let's try to prove that the intuition is correct.

Formally, if we call

*value(*the assigned value of letter**x**)**x**, and*count(*the number of times it occurs in the input string, then the total beauty equals the sum of**x**)*count(*for all**x**) × value(**x**)**x**, and we claim that a valuation is optimal if (and only if):*value(*if**x**) > value(**y**)*count(*.**x**) > count(**y**)This condition is necessary, because if

*value(*while**x**) > value(**y**)*count(*, then swapping the values would increase the total beauty by**x**) < count(**y**)*(value(*and therefore such a valuation cannot be optimal. The condition is also sufficient, because exchanging values for letters which occur equally often does not change the total beauty.**x**) - value(**y**)) × (count(**y**) - count(**x**))Now that we have proven the greedy approach to be correct, we can implement it in Python as follows:

#### Problem B: Balanced Smileys (35 points)

(Full problem statement here.)If we ignore the smileys for a moment, the problem reduces to checking if all parentheses in the input are properly balanced. We can check this in linear time by scanning the string once (e.g. from left to right) and tracking the current nesting depth, which is increased for every opening parenthesis we encounter, and decreased for every closing parenthesis.

Using this approach, the string is well-formed if and only if:

- we end at nesting depth 0, and
- the nesting depth never drops below 0.

Input text: | a | ( | b | ( | c | ) | d | ( | e | ) | ) | f | ( | g | ) | h | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Nesting depth: | 0 | 0 | 1 | 1 | 2 | 2 | 1 | 1 | 2 | 2 | 1 | 0 | 0 | 1 | 1 | 0 | 0 |

But this string has an unmatched opening parenthesis, and thus violates rule 1:

Input text: | a | ( | b | ( | c | ) | d | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Nesting depth: | 0 | 0 | 1 | 1 | 2 | 2 | 1 | 1 |

And this string has an unmatched closing parethesis, which violates rule 2:

Input text: | a | ) | b | ( | c | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|

Nesting depth: | 0 | 0 | -1 | -1 | 0 | 0 |

This approach works well with just parentheses, but the presence of smileys complicates matters, because we don't know in advance if we should count them as parentheses or not. Fortunately, we can adapt the above algorithm to deal with this uncertainty. Instead of tracking a single nesting depth value at each position, we should keep track of a set of integers representing all possible nesting depths.

Since this set will necessarily consist of consecutive integers, we can just store the minimum and maximum elements (knowing that all values in between are possible too). Again, we conclude that the string is well-formed if the lower-bound at the end is 0, and the upper bound never becomes negative (which would imply the set of possibilities is empty).

For example, this string is well-formed:

Input text: | ( | ( | : ) | : ) | ) | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|

Lower bound: | 0 | 1 | 2 | 1 | 0 | 0 | |||||

Upper bound: | 0 | 1 | 2 | 2 | 2 | 1 |

This idea can be implemented succinctly in Python:

Note that this solution asymptotically optimal: it requires linear time and constant space.

#### Problem C: Find The Min (45 points)

(Full problem statement here.)The final problem looks complicated, with all the parameters and formulas described in the problem statement, but we can approach it systematically by breaking it down into simpler subproblems.

First, the problem statement dictates that the input is generated using a pseudo-random linear congruential generator. This is only done to keep the size of the input files small, so we can generate the first

**K**elements of the array using the the provided formula, and then forget about the RNG parameters for the rest of the problem.Although these first

**K**values could be anything, we can make some useful observations about the contents of the array*after*the initial**K**elements:- Every element will be between
**0**and**K**(inclusive) by the pigeonhole principle. - Consequently, every window of
**K + 1**consecutive elements will contain each value between**0**and**K**exactly once (i.e. it contains a permutation of the integers**0**through**K**). - Consequently, for
**i > 2K**:**M[i] = M[i - (K + 1)]**.

**K + 1**. Below is a simple example with**K = 4, N = 18**, where this is property is clear:Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Value | 3 | 1 | 4 | 1 | 0 | 2 | 3 | 4 | 1 | 0 | 2 | 3 | 4 | 1 | 0 | 2 | 3 | 4 |

This means that if we can compute the elements at indices

**K**through**2K**(inclusive), we have effectively computed them all.**K**is not ridiculously large (at most 100,000) but we should still be somewhat efficient in our implementation. I used a*sliding window algorithm*in which the array is calculated from left to right, while two data structures are maintained that contain information about the preceding K elements which is used to quickly calculated new elements.The first data structure counts how often each distinct value is

**present**in the window of K preceding elements. This could be a simple array of**K+1**integers (though I found Python's Counter class slightly more convenient).The second data structure is an ordered collection of integers (between

**0**and**K**, inclusive) that are**missing**in the same window. Of course, I want to take the minimum element from this list at each step, and I want to be able to update it efficiently. Therefore, a plain list isn't the right choice. Instead, I will use a heap structure, although an ordered binary search tree (like Java's TreeSet or C++'s std::set) would also be appropriate.Note that the

**present**and**missing**data structures complement each other: if a value is stored in**missing**, then its count in**present**will be zero. And vice versa: if a value is not in**missing**then it must appear in the current window, and its count in**present**will be nonzeroNow consider how these data structures are updated when the window slides to the right. First, to determine

**M[i]**for an index**i ≥ K**, I can remove the lowest value from the**missing**set, and then increment**present[M[i]]**, thus extending the window on the right by one element. To shrink the window on the left, I need to decrement**present[M[i - K]]**. If the resulting count has reached zero, that means**M[i - K]**doesn't occur anywhere else in the search window, and it should be added to**missing**.The implementation in Python looks like this:

Since heap operations on a list of size O(

**K**) take O(log**K**) time, this algorithm runs in O(**K**×log**K**) time and O(**K**) space. Although this is fast enough for this contest, I suspect this is not optimal, and O(**K**) time should be possible too. If you know how to do it, please leave a comment describing your approach!