MapReduce in The Farmer Was Replaced

Code discussion. Part I.

27 years ago Google was born. One of its key ideas was PageRank – a “credibility” score for a web page. (the paper) To compute it quickly, you need a lot of computational power. One option is to build a specialized supercomputer, but Google as a company rejected that path. Instead, they decided to use large arrays of commodity‑class hardware, connected into a network. You have to carefully orchestrate programs that run on such a platform, and back then writing distributed computation was almost like witchcraft.

In 2004 a new era of distributed computation started – the era of MapReduce platforms (see the paper by Dean and Ghemawat). The idea is simple: split your data into pieces, run the same “map” function in parallel on many machines, then combine (“reduce”) the partial results into one final result.

This post is about adapting that approach to the sunflower‑collection challenge in the game “The Farmer Was Replaced”.

The in‑game task has three stages: prepare the soil, plant sunflowers, harvest. We want to use the 5× bonus, so we try to collect sunflowers in order of measure(): from highest to lower. I’m talking about a late part of the game, when I have a 32×32 field and 32 drones, so I can assign one drone to each column or row.

I wanted a tiny “framework” to assign tasks to drones and run them in parallel. Because we take measurements, we also need to collect data from different drones and merge it. The core of this framework is the function drone_map_reduce from the script.

The function drone_map_reduce takes three parameters:

⌖ objects – things to process
⌖ f_map – a function that processes a batch of objects
⌖ f_red – a reducer function that combines results from each batch into one big result

GitHub reference

The drone API we rely on:

⌖ spawn_drone(f) – starts a drone running function f and returns a drone id
⌖ a drone “disappears” when f finishes
⌖ wait_for(drone_id) – waits for the drone to finish and returns its result

First we create slots for drones and distribute tasks between them:

def drone_map_reduce(objects, f_map, f_red):
    drone_jobs = []

    for _ in range(max_drones()):
        drone_jobs.append([])

    ind = 0
    for q in objects:
        drone_jobs[ind % len(drone_jobs)].append(q)
        ind += 1

Now each element in drone_jobs is a batch of work for one drone (or for the main thread).

Then we start the drones and also process some batches in the main thread:

    results = []
    drones = []

    while drone_jobs:
        while drone_jobs and num_drones() < max_drones():
            current = drone_jobs.pop()
            drones.append(spawn_drone(f_map(current)))
        if drone_jobs:
            current = drone_jobs.pop()
            results.append(f_map(current)())

After that we collect all results from the drones:

    for current_drone in drones:
        results.append(wait_for(current_drone))

And finally we combine everything into one result using f_red:

    reduced = None

    for r in results:
        if reduced is None:
            reduced = r
        else:
            reduced = f_red(reduced, r)

    return reduced

A few important details:

⌖ To create a valid drone job, f_map expects a list of the same type of elements that you store in objects.
⌖ We create up to max_drones() batches, so in the inner loop we just spawn drones until num_drones() reaches max_drones().
⌖ The code still works when the number of tasks is less than the number of drones (for example, with set_world_size(3)); some batches just end up empty.
⌖ If your drone function doesn’t return anything (None) and you don’t care about results, you can pass a dummy reducer like no_reduce that ignores its arguments. Or even just None value for reducer function because it will be never called.

Next time: how to build actual game logic on the top of this instrument.