Facebook Hacker Cup 2016: Qualification Round problem analysis en

By Soultaker on Tuesday 12 January 2016 01:00 - Comments (1)
Category: Programming Contests, Views: 3.919

As I'm writing this, the Qualification Round of the Facebook Hacker Cup 2016 is about to finish. For those not in the know: the Hacker Cup is an annual algorithmic programming contest, where contestants solve mathematical puzzles by writing some code (no "hacking" in the contemporary sense involved).

Over the past few years I've been posting my analysis of the problems on my blog, and I thought I might as well continue the tradition. As usual, there are major spoilers ahead, as well as code, and some bonus questions for those that managed to solve all the problems already. So let's begin!

Read more »

Facebook Hacker Cup 2015: Qualification Round problem analysis en

By Soultaker on Monday 12 January 2015 09:50 - Comments (5)
Category: Programming Contests, Views: 7.755

Another year, another Facebook Hacker Cup, and another opportunity for smart people who know at least one programming language to put their problem solving skills to the test. As in earlier years, I will post a short analysis along with my solution for each problem, and I'll add some follow-up questions that people who have already solved the problems might find interesting.



A. Cooking the Books

Problem Statement (official link).

The qualification round starts off easily. For the first problem, we're given a number, and we're supposed to turn it into the highest/lowest value possible, by swapping just two digits. Warning: spoilers ahead!

Read more »

Facebook Hacker Cup 2013: round 1 problem analysis en

By Soultaker on Sunday 3 February 2013 19:09 - Comments (17)
Category: Programming Contests, Views: 32.885

Round 1 of the Facebook Hacker Cup has just ended. Competitors were required to solve three nicely-balanced problems: all of them required substantial thought, without being unreasonably difficult. Like last week, I will describe my solutions to the problems, although this time the solution source code is written in C++.

Problem 1: Card Game (20 points)

Problem statement here.

The easiest problem in the contest is a combinatorial one. We are given an array A of N distinct integers, and are asked to calculate for each element A[i] how many subsets of A of size K exist, such that A[i] is the largest in the set. Then, we must multiply A[i] by the number of such sets, sum the results, and report the answer modulo 1,000,000,007.

On a high level, this isn't very difficult. If we start by sorting the array A, then we can simply compute the answer as:
  • sum A[i] ◊ C(i, K - 1) mod 1,000,000,007 (for 0 ≤ i < N)
... where C(n,r) is the combinatorial choice function that calculates in how many ways we can select r distinct elements from a set of n. The idea behind this formula is that if A is sorted, and A[i] is the largest element, then the other K-1 elements must be drawn from A[0]..A[i-1], which are exactly the i elements smaller than A[i].

We can implement this idea in at least three different ways, and the differences are mostly in how we implement the choice function. The most obvious way is to use mathematical definion:
  • C(n,k) = n! / k! / (n - k)!
... where x! denotes the factorial of x. This is reasonably efficient if we precalculate the factorials, but it involves very large integers, so we need to use a language that can handle those.

If we want to limit ourselves to 64-bit integer arithmetic only, which is a lot faster, we should perform all calculations modulo 1,000,000,007. But in that case, the division is problematic, because the modulo operator doesn't distribute over division; i.e. generally (x / y) mod z ≠ (x mod z) / (y mod z).

There are two ways to overcome this problem. One is to note that 1,000,000,007 is a prime number, and thus we can calculate the multiplicative inverse of each factorial using e.g. the extended Euclidean algorithm, and multiply with the inverse instead of dividing. This yields on O(N log N) solution, which is pretty good considering that N ≤ 10,000.

During the contest I used a more brute-force approach, however, by simply pre-calculating all relevant binomial coefficients using the recurrence relation:
  • C(n,0) = 1
  • C(n,n) = 1
  • C(n,k) = C(n-1,k-1) + C(n-1,k)
The advantage of this method is that it doesn't require division at all (or multiplication, for that matter). This yields an O(N◊N) time and space algorithm, but implemented in C++ this is fast enough, especially considering that the table is independent of the actual input and thus needs to be computed only once.


C++:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include <algorithm>
#include <iostream>
#include <vector>

static const int MaxN = 10000, Mod = 1000000007;
static int nCr[MaxN+1][MaxN+1];

int main()
{
    // Precalculate binomial coefficients:
    for (int n = 0; n <= MaxN; ++n)
    {
        nCr[n][0] = nCr[n][n] = 1;
        for (int r = 1; r < n; ++r)
            nCr[n][r] = (nCr[n - 1][r] + nCr[n - 1][r - 1])%Mod;
    }

    int cases = 0;
    std::cin >> cases;
    for (int caseno = 1; caseno <= cases; ++caseno)
    {
        // Read input:
        int N = 0, K = 0;
        std::cin >> N >> K;
        std::vector<int> A(N);
        for (int i = 0; i < N; ++i)
            std::cin >> A[i];

        // Calculate sum:
        std::sort(A.begin(), A.end());
        int answer = 0;
        for (int i = K - 1; i < N; ++i)
            answer = (answer + (long long)nCr[i][K - 1]*A[i])%Mod;

        std::cout << "Case #" << caseno << ": " << answer << std::endl;
    }
}



I suspect it is also possible to combine the summation with the calculation of the combinations, resulting in a simpler solution that avoids the above complications.

Problem 2: Security (35 points)

Problem statement here.

For the second problem, we must determine the lexicographically (read: alphabetically) smallest key string that is consistent with the output produced by the specified transformation, or output "IMPOSSIBLE" if no valid answer exists.

Practically speaking, we need to fill in the blanks in K1 in such a way that it is possible to rearrange its M parts into K2. The key to solving this problem is to realize that if we can determine whether an answer exists at all, we can use that same method to construct the lexicographically least possible answer by filling in the blanks in K1 from left to right, and from 'a' to 'z'.

We can model the problem as a bipartite graph where each vertex corresponds with the parts of K1 and K2, and an edge exists between two parts if the strings match (in the sense that "ab?" matches "?bc" because on every position all characters match). Then a solution exists if and only if the maximum matching in this graph contains exactly M edges.

For example, if K1="ab???c?ab" and K2="a?ccab?a?" and M=3 (so each key consists of three parts of three characters) then the corresponding graph looks like this:

http://tweakers.net/ext/f/HjAmJLcAvsooq7lA2iwYNwwN/full.png


There are lines connecting the parts that can be matched together, and the green edges indicate the maximum possible matching. Since all parts are matched, there is a solution for the given pair of keys.

I implemented a maximum matching algorithm based on augmenting paths. How efficient is this approach? We may need to determine if a pair of strings are compatible at most 6◊N times (once for each possible blank in K1, multiplied by the number of possible letters: a-f).

For each such case, we must construct a bipartite graph which will have at most M◊M edges and find upto M augmenting paths. Finding a single augmenting path requires going through all edges at most once, so this part of the algorithm takes O(M^3) time.

The total runtime of the algorithm is therefore O(6*N*M^3), or about 75 million steps in the worst case, which is acceptable.


C++:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
#include <iostream>
#include <string>
#include <vector>

static std::vector<std::vector<int> > adj;
static std::vector<int> matchL, matchR;
static std::vector<char> visited;

static bool match(int i)
{
    if (visited[i]) return false;
    visited[i] = true;
    for (size_t n = 0; n < adj[i].size(); ++n)
    {
        const int j = adj[i][n];
        if (matchR[j] < 0 || match(matchR[j])) 
        {
            matchL[i] = j;
            matchR[j] = i;
            return true;
        }
    }
    return false;
}

static bool match_one()
{
    for (int i = 0; i < adj.size(); ++i) 
    {
        if (matchL[i] < 0 && match(i)) return true;
    }
    return false;
}

static int maximum_matching()
{
    int res = 0;
    matchL.assign(adj.size(), -1);
    matchR.assign(adj.size(), -1);
    while (visited.assign(adj.size(), 0), match_one()) ++res;
    return res;
}

static bool compatible(const std::string &s, const std::string &t)
{
    for (size_t i = 0; i < s.size(); ++i)
    {
        if (s[i] != '?' && t[i] != '?' && s[i] != t[i]) return false;
    }
    return true;
}

static bool possible(const std::string &K1, const std::string &K2, int M)
{
    int L = K1.size() / M;
    adj.assign(M, std::vector<int>());
    for (int i = 0; i < M; ++i)
    {
        for (int j = 0; j < M; ++j)
        {
            if (compatible(K1.substr(L*i, L), K2.substr(L*j, L)))
                adj[i].push_back(j);
        }
    }
    return maximum_matching() == M;
}

int main()
{
    int cases = 0;
    std::cin >> cases;
    for (int caseno = 1; caseno <= cases; ++caseno)
    {
        int M;
        std::string K1, K2;
        std::cin >> M >> K1 >> K2;

        std::cout << "Case #" << caseno << ": ";
        if (!possible(K1, K2, M))
        {
            std::cout << "IMPOSSIBLE" << std::endl;;
            continue;
        }
        for (std::string::iterator it = K1.begin(); it != K1.end(); ++it)
        {
            if (*it == '?')
            {
                *it = 'a';
                while (!possible(K1, K2, M))
                    ++*it;
            }
        }
        std::cout << K1 << std::endl;
    }
}

Problem 3: Dead Pixels (45 points)

Problem statement here.

For the third problem, I used a brute-force approach again. The constraints are such that it is barely feasible to iterate over all possible window placements, except that we must be able to determine if each placement is possible nearly instantly, or we'll run out of time.

The key to solving this problem is to first reduce it from two dimensions to one. Suppose the height of the window to be placed (Q) is equal to the height of the screen (H). Then, the Y-coordinate of dead pixels doesn't matter, and we are only interested in which columns are blocked by a dead pixel. If we iterate over the blocked columns from left to right, we can easily track the gaps between dead columns; if we find a gap of width G, then we can place a window in the gap if its width P ≤ G. In fact, there are exactly (G - P + 1) ways to place the window in the gap.

To extend this idea to two dimensions, we start by assuming the window is placed against the top of the screen. We can then count how many dead pixels occur in each column for the first Q rows of the screen (with 0-based indices 0 through Q-1), and scan the columns for gaps as described above, counting all valid placements where the top of the window is at Y-coordinate 0. If we move the window down one row (to Y-coordinate 1), then we should update the count of dead pixels per column by adding the dead pixels on row Q and subtracting the dead pixels on row 0 (which no overlap the window). We can repeat this process until we reach the bottom of the screen.

This solution requires Θ(W◊H) time. Since W,H ≤ 40,000 that could take a whopping 1,600,000,000 operations per test case; that's a lot! Fortunately, each operation is very simple, so implemented in C++ this solution takes no more than a few seconds per testcase. I wouldn't want to try this in a scripting language, though!


C++:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#include <iostream>
#include <vector>

int main()
{
    int cases = 0;
    std::cin >> cases;
    for (int caseno = 1; caseno <= cases; ++caseno)
    {
        // Read input (that's a lot of parameters!)
        int W, H, P, Q, N, X, Y, a, b, c, d;
        std::cin >> W >> H >> P >> Q >> N >> X >> Y >> a >> b >> c >> d;

        // Generate a list of dead pixels (grouped by row)
        std::vector<std::vector<int> > dead(H);
        for (int x = X, y = Y, i = 0; i < N; ++i)
        {
            int nx = (x*a + y*b + 1)%W,
                ny = (x*c + y*d + 1)%H;
            dead[y].push_back(x);
            x = nx;
            y = ny;
        }
        // Scan row by row, tracking which columns are blocked by dead pixels:
        std::vector<int> blocked(W + 1);
        blocked[W] = 1;
        for (int y = 0; y < Q - 1; ++y)
        {
            for (std::vector<int>::iterator it = dead[y].begin();
                 it != dead[y].end(); ++it) ++blocked[*it];
        }
        int answer = 0;
        for (int y = 0; y + Q <= H; ++y)
        {
            // Add bottom row
            for (std::vector<int>::iterator it = dead[y + Q - 1].begin();
                 it != dead[y + Q - 1].end(); ++it) ++blocked[*it];

            // Count number of valid horizontal placements:
            int edge = P;
            for (int c = 0; c <= W; ++c)
            {
                if (blocked[c])
                {
                    if (edge <= c) answer += c - edge + 1;
                    edge = c + 1 + P;
                }
            }

            // Remove top row
            for (std::vector<int>::iterator it = dead[y].begin();
                 it != dead[y].end(); ++it) --blocked[*it];
        }

        std::cout << "Case #" << caseno << ": " << answer << std::endl;
    }
}

Facebook Hacker Cup 2013: qualification round problem analysis en

By Soultaker on Tuesday 29 January 2013 01:00 - Comments (11)
Category: Programming Contests, Views: 29.097

As in previous years, I will be competing in the Facebook Hacker Cup, and I will describe the solutions I come up with on this weblog, hoping that other programmers or fellow competitors find them interesting.

I try to balance brevity with rigor: pasting just my solution code would not be very informative, but detailed proofs get boring quickly. Aiming for a happy medium, I will describe my solution approach before presenting the corresponding source code, adding proof outlines where necessary and linking to Wikipedia for detailed explanations of well-known topics.

This post contains source code written in Python. Unfortunately, Tweakers.net persists in their failure to support syntax highlighting for this popular language, which is why you will see screen shots below (but don't worry: links to the raw source code are provided as well).


Problem A: Beautiful Strings (20 points)

(Full problem statement here.)

We are asked to maximize the total “beauty” of a string, calculated as the sum of the beauty of the letters in the string, by assigning optimal values to different letters. The intuitive approach is to greedily assign the highest value (26) to the most common letter, the next highest value (25) to the next most common letter, and so on. Before coding this up, let's try to prove that the intuition is correct.

Formally, if we call value(x) the assigned value of letter x, and count(x) the number of times it occurs in the input string, then the total beauty equals the sum of count(x) ◊ value(x) for all x, and we claim that a valuation is optimal if (and only if): value(x) > value(y) if count(x) > count(y).

This condition is necessary, because if value(x) > value(y) while count(x) < count(y), then swapping the values would increase the total beauty by (value(x) - value(y)) ◊ (count(y) - count(x)) and therefore such a valuation cannot be optimal. The condition is also sufficient, because exchanging values for letters which occur equally often does not change the total beauty.

Now that we have proven the greedy approach to be correct, we can implement it in Python as follows:

http://tweakers.net/ext/f/p6eVY275Dk3CKLyQr2c6m398/full.png


Problem B: Balanced Smileys (35 points)

(Full problem statement here.)

If we ignore the smileys for a moment, the problem reduces to checking if all parentheses in the input are properly balanced. We can check this in linear time by scanning the string once (e.g. from left to right) and tracking the current nesting depth, which is increased for every opening parenthesis we encounter, and decreased for every closing parenthesis.

Using this approach, the string is well-formed if and only if:
  1. we end at nesting depth 0, and
  2. the nesting depth never drops below 0.
For example, this is a string with balanced parentheses:
Input text:a(b(c)d(e))f(g)h
Nesting depth:00112211221001100

But this string has an unmatched opening parenthesis, and thus violates rule 1:
Input text:a(b(c)d
Nesting depth:00112211

And this string has an unmatched closing parethesis, which violates rule 2:
Input text:a)b(c
Nesting depth:00-1-100

This approach works well with just parentheses, but the presence of smileys complicates matters, because we don't know in advance if we should count them as parentheses or not. Fortunately, we can adapt the above algorithm to deal with this uncertainty. Instead of tracking a single nesting depth value at each position, we should keep track of a set of integers representing all possible nesting depths.

Since this set will necessarily consist of consecutive integers, we can just store the minimum and maximum elements (knowing that all values in between are possible too). Again, we conclude that the string is well-formed if the lower-bound at the end is 0, and the upper bound never becomes negative (which would imply the set of possibilities is empty).

For example, this string is well-formed:

Input text:((: ): ))
Lower bound:012100
Upper bound:012221

This idea can be implemented succinctly in Python:

http://tweakers.net/ext/f/NZsfMaTL1eWluhd4sE8Wgl2Q/full.png

Note that this solution asymptotically optimal: it requires linear time and constant space.


Problem C: Find The Min (45 points)

(Full problem statement here.)

The final problem looks complicated, with all the parameters and formulas described in the problem statement, but we can approach it systematically by breaking it down into simpler subproblems.

First, the problem statement dictates that the input is generated using a pseudo-random linear congruential generator. This is only done to keep the size of the input files small, so we can generate the first K elements of the array using the the provided formula, and then forget about the RNG parameters for the rest of the problem.

Although these first K values could be anything, we can make some useful observations about the contents of the array after the initial K elements:
  1. Every element will be between 0 and K (inclusive) by the pigeonhole principle.
  2. Consequently, every window of K + 1 consecutive elements will contain each value between 0 and K exactly once (i.e. it contains a permutation of the integers 0 through K).
  3. Consequently, for i > 2K: M[i] = M[i - (K + 1)].
The final conclusion is useful because it implies that the generated array is cyclic with period K + 1. Below is a simple example with K = 4, N = 18, where this is property is clear:
Index†0†1†2†3†4†5†6†7†8†91011121314151617
Value314102341023410234

This means that if we can compute the elements at indices K through 2K (inclusive), we have effectively computed them all. K is not ridiculously large (at most 100,000) but we should still be somewhat efficient in our implementation. I used a sliding window algorithm in which the array is calculated from left to right, while two data structures are maintained that contain information about the preceding K elements which is used to quickly calculated new elements.

The first data structure counts how often each distinct value is present in the window of K preceding elements. This could be a simple array of K+1 integers (though I found Python's Counter class slightly more convenient).

The second data structure is an ordered collection of integers (between 0 and K, inclusive) that are missing in the same window. Of course, I want to take the minimum element from this list at each step, and I want to be able to update it efficiently. Therefore, a plain list isn't the right choice. Instead, I will use a heap structure, although an ordered binary search tree (like Java's TreeSet or C++'s std::set) would also be appropriate.

Note that the present and missing data structures complement each other: if a value is stored in missing, then its count in present will be zero. And vice versa: if a value is not in missing then it must appear in the current window, and its count in present will be nonzero

Now consider how these data structures are updated when the window slides to the right. First, to determine M[i] for an index i ≥ K, I can remove the lowest value from the missing set, and then increment present[M[i]], thus extending the window on the right by one element. To shrink the window on the left, I need to decrement present[M[i - K]]. If the resulting count has reached zero, that means M[i - K] doesn't occur anywhere else in the search window, and it should be added to missing.

The implementation in Python looks like this:

http://tweakers.net/ext/f/FTa9qZshu5GLjo7voP5frWLc/full.png

Since heap operations on a list of size O(K) take O(log K) time, this algorithm runs in O(K◊log K) time and O(K) space. Although this is fast enough for this contest, I suspect this is not optimal, and O(K) time should be possible too. If you know how to do it, please leave a comment describing your approach!

Facebook Hacker Cup round 1 problem analysis en

By Soultaker on Sunday 29 January 2012 19:18 - Comments (10)
Category: Programming Contests, Views: 20.974

Over the past 24 hours the first official round of the Facebook Hacker Cup was held, with thousands of qualified competitors competing for one of (at least) 500 spots to participate in round 2. There was a nice spread in topics for the problems and their difficulty was much more balanced than in the qualification round, which made for quite an enjoyable match with many competitors solving all three problems.

Like last week, I will analyze the problems in order of increasing difficulty. Spoilers abound, so don't read on if you want to try to solve the problems yourself!

Read more »