Calculating Factorio

Factorio is a game in which you dig resources out of the ground, craft those resources into items, then craft those items into other items, following a graph of recipes of considerable complexity. Good planning can be tremendously helpful when building out your factory, and a key part of that is knowing how much of any given item you will need to make as a prerequisite for more advanced items down the line. The first step of such planning is calculation.

1. Basics
2. Complex recipes
3. Handling multiple unknowns
4. No solution found
5. Linear programming

Basics

As a simple example, let's look at the recipe for one of the first items in the game, the electronic circuit:
Electronic circuit
Craft time: 0.5 seconds
Ingredients:

Recipes consist of three major things: A crafting time, one or more products, and one or more ingredients. Products and ingredients consist of an amount and an item.

The vast majority of the recipes in the game will produce a single instance of a single item. Several will make multiple copies of an item at a time, and a bare handful will make multiple distinct items at once. The recipe for the electronic circuit, shown above, will produce a single electronic circuit each time it is completed.

Copper cables and iron plates have their own recipes, in turn.

2x Copper cable
Craft time: 0.5 seconds
Ingredients:

Iron plate
Craft time: 3.5 seconds
Ingredients:

And then copper cables require copper plate, the recipe for which looks like this:

Copper plate
Craft time: 3.5 seconds
Ingredients:

Iron ore and copper ore don't have a recipe, and are mined directly from deposits in the game world.

If one were to represent the complete tree of items and the ratios required of each of their prerequisites, it would look something like this:

Notice how electronic circuits require three copper cables, but the recipe for copper cables produces two of them at a time. This means that, even though items in Factorio are indivisible units, a single electronic circuit effectively requires one and a half units of copper ore. More concretely, it means that making two electronic circuits will consume three pieces of copper ore. And because electronic circuits and copper cables have the same crafting time, this means that you can build a ratio of 2 electronic circuit assemblers to 3 copper cable assemblers in your factory to achieve perfect throughput.

The relationship between crafting times and ratios is straightforward, but deserves mentioning. The ratios shown above are best thought of as a series of rates. For instance, if you wanted to produce one electronic circuit per second, then you would also need to make three copper cables per second, one-and-a-half copper plates per second, and so on. These rates would remain constant regardless of the crafting times of the individual recipes. Recipes with longer crafting times will simply require more assemblers or furnaces working in parallel in order to satisfy the desired rate.

(There is the additional detail that different types of assemblers and furnaces have different crafting speeds. Working through the arithmetic here is outside the scope of this post, however.)

A tree similar to the above is sufficient for calculating the production ratios between the majority of the recipes in the game. However, the complete recipe graph hides a few complex corner-cases, and the technique shown above will fail once it runs into the weirder parts of the graph.

Complex recipes

The tree shown above is the result of a simple graph traversal, multiplying ratios as ingredients are needed, and dividing them as products are found. Given that the vast majority of items in the game are produced by a single recipe, and that the vast majority of recipes produce a single item, this can get you quite far before you run into any ambiguity as to where an item is supposed to come from.

But, inevitably, you will run into the oil production pipeline.

Basic oil processing
Craft time: 5 seconds
Produces:
Ingredients:

Advanced oil processing
Craft time: 5 seconds
Produces:
Ingredients:

Heavy oil cracking
Craft time: 5 seconds
Produces:
Ingredients:

Light oil cracking
Craft time: 5 seconds
Produces:
Ingredients:

What a pain in the ass. These recipes violate a number of assumptions that it would be easy to make if you looked exclusively at the non-oil parts of the recipe graph.

Notice how the fluids produced by these recipes (heavy oil, light oil, and petroleum gas; collectively referred to as the oil products) are each created by multiple recipes. The basic and advanced oil processing recipes are also in that exclusive class of recipes which produce multiple, distinct items.

There are some other recipes closely related to the oil pipeline which add to the annoyance. Each of the three oil products may be turned into solid fuel, via three additional recipes. There is also a system in which crude oil may be placed into barrels, and then the full barrels may be emptied again. This is accomplished via a pair of recipes that depend on one another. As a consequence, the recipe graph is not a DAG. Any algorithm that wants to work with the recipe graph in a general sense must be prepared to deal with cycles.

So: Suppose we want to make an item that depends on one of the oil products. A good example would be plastic.

2x Plastic bar
Craft time: 1 second
Ingredients:

If we want to produce plastic bars at a given rate, we will require petroleum gas at 1.5 times that rate. However, we can obtain petroleum gas from three different recipes, and some of those recipes ultimately depend on each other, and they also end up producing the other oil products at the same time, which we can turn into more petroleum gas via the cracking recipes.

How can we calculate the optimal selection of recipes to produce this with? And what if we actually want to use more than one oil product, at some specific ratio?

It turns out that this is a straightforward problem of linear algebra. The key is representing the recipes in a useful way. We can encode the recipe graph using a matrix like the following.
-400301000
30-30304500
020405500
-30-300-5010
00-100-10001
The rows of this matrix represent the individual fluids involved with the oil production pipeline. The columns represent the recipes. Positive values represent the products of recipes, and negative values represent ingredients. The two recipes on the right represent the raw resources that serve as inputs. These aren't real recipes in the game, but it is useful to model resources using pseudo-recipes that produce them at no cost.

Now let's use this matrix to determine the optimum ratios for some series of oil products. Completely arbitrarily, let's say we want to produce 45 units of petroleum gas, and 10 units of heavy oil, per unit time. We can represent this as an item vector:
10
0
45
0
0
If we tack this onto the recipe matrix in order to form an augmented matrix, and then place that matrix into its reduced row echelon form, we get this result:
result
10000-13/400-23/12
01000-7/400-1/4
00100-3/50-10/3
000101/2010/3
000011305/3
The recipe matrix has five rows and six columns. Therefore, any solution to that matrix will still contain one unknown. Intuitively, the presence of this unknown represents the fact that you have to make a choice between the two oil processing recipes. There is an extent to which they may be substituted for one another, and so there can be no exact numerical solution until we can decide how to choose a value for this one remaining unknown.

To get a better sense of what this looks like, let's plot the system of equations represented by the above matrix.

In mathematical terms, let's call the recipe matrix A, the vector of ratios we want to solve for x, and the vector of desired outputs b. Together, these satisfy the relationship Ax = b. However, it is very important that all of the components of x are positive. A negative value in x would represent trying to operate a recipe in reverse, which is not something that makes sense in the context of the game.

Thus, the vertical lines in the above plot represent the minimum and maximum values for xcrude oil such that all of the values in x will be greater than or equal to zero. Or, in game terms, they represent the minimum and maximum amounts of oil we could spend in order to produce 45 units of petroleum gas and 1 unit of heavy oil, with no other oil products left over.

In order to choose a value for xcrude oil, we need to establish an order of priorities. Ultimately, we are trying to find the most efficient production chain for the items we want to make, where efficiency is defined as using as few resources as possible. But, as we can see with water and crude oil in the above plot, sometimes resources are used in inverse proportion to one another. Using less crude oil would mean using more water, and vice versa.

Given that crude oil can be annoying to acquire, and water is effectively free, it makes sense to define, a priori, that optimizing for crude oil has a higher priority than optimizing for water. Therefore, we can say that the optimum value for xcude oil is the minimum possible one, as shown in the plot above.

Taking that value as a solution, our final value for x is this.
0.000.780.210.3842.6958.97
Huh, interesting. Heavy oil cracking turned out to be zero. I wonder if that's significant?

Handling multiple unknowns

The complete recipe graph in vanilla Factorio (0.15.2) has 204 items and 215 recipes. This means that a solution involving that entire graph could have as many as 11 unknowns. How do we decide on concrete values in this case?

Let's start with a more constrained example. Let's take the oil production matrix above, and add the recipes for solid fuel.
-4003010-200000
30-3030450-10000
020405500-2000
000011100
-30-300-5000010
00-100-10000001
Suppose we arbitrarily want to make solid fuel at a rate of 100 items per unit time. Sticking this onto the recipe graph and finding its RREF, as above, gives us this:
result
1000005/2-1/20-33/400-50
0100005/22-3/220-137/4400-450/11
00100018/11-1/55-43/550-600/11
000100-18/111/553/44600/11
000010-37/119/1109/110500/11
00000148/11-9/110-9/110600/11
We're left with three unknowns, which is to be expected, given that there are six items and nine recipes. Making sense of these numbers requires taking a step back and looking at what the recipe matrix actually means.

Items come from recipes. In the normal case, each item is produced by one recipe. The oil production pipeline violates this assumption, by having three items that are produced by four recipes. Thus, its recipe matrix is not square, and calculations involving oil end up with a single unknown. In practice, regardless of the order in which you happen to arrange the columns of the matrix, this unknown is really asking: Which oil product do I need to use the least?

To put that another way: Having more recipes than items results from having multiple ways of making an item. The unknown that results from trying to solve the resulting recipe matrix represents the choice between those recipes.

There are three possibilties when trying to choose between these recipes.

  1. All possibilities reduce to different proportions of some underlying resource. The most efficient possibility should win.
  2. All possibilities reduce to the same quantities of the underlying resources. They are equal, and one may be chosen arbitrarily.
  3. Each possibility reduces to different underlying resources. The possibility that uses the lower-priority resource wins.
More plainly: Each unknown represents a choice to be made somewhere, and one of the options will win. In mathematical terms, each unknown means that an element of the vector x will be zero.

We saw this in the last section, when the ratio for heavy oil cracking turned out to be zero. This result can be understood intuitively: We required heavy oil as part of the output, and advanced oil processing produces heavy oil at an extremely low ratio, while making the other products at higher ratios. It therefore follows that production of heavy oil would be the bottleneck, and we would have no excess heavy oil to pass on to the heavy oil cracking recipe.

There is always a bottleneck.

Applying that same intuition to the result above, we can guess which three recipes will end up being zero. First, notice that light oil produces solid fuel at a vastly more advantageous ratio (10:1) than either of the other two products (20:1). Then notice that the ratio for converting from heavy oil to light oil (40:30) is more advantageous than converting from heavy oil directly to solid fuel.

So: The first zero recipe is the solid fuel from heavy oil recipe. It's more efficient to convert all of our heavy oil to light oil.

On the other hand, we have petroleum gas, which we will be producing as an unavoidable byproduct of the oil processing recipes, and so we're forced to make a certain proportion of our solid fuel from it. Converting our light oil into petroleum gas would be much less efficient than just converting all of the light oil directly into solid fuel, and so the second zero recipe is light oil cracking.

Finally, given that we really want all of the light oil we can get our hands on, we can look back at the oil processing recipes and note that basic oil processing makes less of it than advanced oil processing does. Thus, the third zero recipe is basic oil processing.

But how can we calculate this automatically?

In the previous section, we essentially determined the optimum value for x by trying to zero out each value in x, and checking whether the result was valid. This yielded two valid results (the minimum and maximum shown in the graph), from which we chose the optimum one.

This can be generalized for multiple unknowns by choosing every combination of N elements from x, finding solutions for the rest of x when those elements are zero, and then seeing if that solution is valid (i.e. if it contains no negative elements). This will, more than likely, yield multiple valid solutions, and so the final step is to choose the most optimal solution.

This process can be optimized slightly by only considering those recipes that have at least one product that is produced by multiple recipes. The only recipes that can be considered to be zero in this process are those which have alternatives. The game's complete recipe graph has 215 recipes with 11 unknowns. Computing this naively would yield over 8×1017 possibilties. However, the game only has 22 recipes which actually need to be considered, which brings the number of possibilities down to 705,432. Further reductions in the number of possibilities are possible as well, since some combinations of zeros are mutually exclusive, but this is outside the scope of this post. (The bulk of these excess recipes come from the fluid barreling system, which can largely be ignored as redundant in most production chains.)

For the purposes of our example, all this means is that we do not need to consider the possibility that water or crude oil will be zero (although they may be zero as a natural consequence of other columns being zero). If we take our 7 remaining recipes and set three of them to zero at a time, we get 35 possible solutions, of which the following are valid:
0.000.000.0012.906.4558.0635.48645.161290.32 The "don't crack anything" solution.
0.000.0015.380.0023.0846.1530.770.001538.46 The same, but with basic oil processing.
0.0031.580.0021.0510.530.0089.472000.002105.26 Crack all the light oil to petroleum gas.
0.0022.2222.220.0033.330.0066.67666.672222.22 The same, but with basic oil processing.
3.120.000.0012.500.0065.6234.38718.751250.00 The winner!
10.340.0013.790.000.0072.4127.59310.341379.31 Crack all the heavy oil, but use basic oil processing.
5.5638.890.0022.220.000.00100.002444.442222.22 All petroleum gas, all the time.
20.0046.6726.670.000.000.00100.002000.002666.67 The same, but with basic oil processing.
And would you look at that? The zero entries for the optimal solution are the ones we predicted.

No solution found

Sometimes a solution can't be found.

If we think of the recipe matrix as a vector space, there are some places we just can't reach. In the vanilla game, this is entirely the fault of the oil processing recipes.

To illustrate this, here is a somewhat simplified oil processing recipe graph, with only heavy oil and light oil.
-403010
303045
It is not possible to produce every combination of heavy oil and light oil. We can't make more heavy oil than light oil; there simply isn't a recipe available to do that. The interactive graph below illustrates this.

0
0
0

The situation with the complete recipe graph is analogous, except with a third dimension representing petroleum gas.

This can be overcome by introducing new vectors which can "waste" any given item. So long as these matrix columns are treated with the lowest priority, they should be used only when required.

Linear programming

The naive linear algebraic approach described above is simple, but has some clear shortcomings. As the recipe graph grows more lopsided, there is a combinatoric explosion of possible solutions, which quickly becomes computationally prohibitive. Luckily, we have linear programming.

The chief difficulty of representing a Factorio recipe graph as a linear program lies in crafting the objective function. The linear algebra approach described in the previous sections has the advantage of returning each possible solution as a tuple of costs, which may be compared to each other in whichever way is found most convenient.

When specifying the recipe graph as a linear program, however, the costs have to be specified as a linear combination, which ultimately has to yield a scalar value. There is a way to do this, but first let's go through the relatively simple example of the oil processing pipeline.

Here's a matrix we used earlier:
output
-40030100010
30-303045000
02040550045
-30-300-50100
00-100-100010
In linear programming terms, we want to minimize our resource expenditures, while producing at least the rate of items we're interested in. In other words:

Minimize C = xcrude oil subject to:

-40x1 + 30x3 + 10x4 ≥ 10
30x1 - 30x2 + 30x3 + 45x4 ≥ 0
20x2 + 40x3 + 55x4 ≥ 45
-30x1 - 30x2 - 50x4 + xwater ≥ 0
-100x3 - 100x4 + xcrude oil ≥ 0
∀i: xi ≥ 0
Note that, at this point, we are specifying crude oil as the only cost because we know ahead of time that we want it to be so. I will address the general case of determining a cost function later.

The tableau for our linear program looks like this:
s1 s2 s3 s4 s5 C answer
-400301000-10000010
30-303045000-100000
02040550000-100045
-30-300-5010000-1000
00-100-100010000-100
0000010000010
This program is not in the standard form, but we can get there with a bit of number-wrangling. Applying the simplex method to the matrix from there is fairly trivial. We ultimately end up with the following:
s1 s2 s3 s4 s5 C answer
-400/1300001-50/39-80/117-40/39 0-102300/39
-7/13100001/3905/234-7/39000061/78
400/130001021/135/39-17/13-100555/13
20/13001002/65-2/195-1/650005/13
-24/1301000-17/3902/5851/1950008/39
400/130000050/3980/11740/39011-2300/39
Or, in plainer terms:
0.000.780.210.3842.6958.97
Notice that this is the same result that we obtained earlier.

Now let's circle back to the cost function.

Ultimately, we want to prioritize our resource usage according to an a priori notion of how "difficult" each resource is to acquire. Crude oil is somewhat difficult to obtain compared to water, which is effectively free. Therefore, our cost function should value crude oil more highly than water. The question is: How much more?

Let's see what happens to the above solution if we define water to have a fixed cost of 1, while adjusting the cost of crude oil:

As we increase the cost coefficient of oil, the solution jumps, discontinuously, from one value for x to another. Past a certain point, it settles on a single solution, which will remain the same as the cost coefficient increases to infinity. This solution is the same one we get when we set water's cost to 0 and crude oil's cost to 1.

The cost function is a linear function, and the ratio of these costs defines the slope of the resulting line. As that slope changes, the cost function will reach different vertices on the boundary of the feasible region. Once the slope exceeds that final critical value, the vertex it will reach will remain the same no matter how much steeper that slope becomes, all the way up to infinity (a completely vertical line).

It follows, then, that the ratio in the costs between these resources may be any value that we know to be greater than or equal to this highest critical ratio. Actually calculating this critical value basically requires a linear program of its own, but it turns out we can assume that it will always be less than the ratio between the largest absolute value in the linear program and the smallest. In the example we've been using, that ratio is 10.

To restate our goal, we want our cost function to represent a lexicographical ordering of our resources: First minimize our use of crude oil, then (if multiple solutions would use equal amounts of crude oil) minimize our use of water. Given this, we can revise our cost function to:

C = xwater + 10xcrude oil
The resulting solution remains the same as above.

Now let's discuss the surplus variables. These represent quantities of items which we produce as surplus, in addition to the amounts we requested. While our priority is still, first and foremost, to minimize our resource expenditures, we also want to avoid making extra items that we don't need to.

For example, if a solution requires producing surplus petroleum gas, this cost function means that converting all of that excess petroleum gas into solid fuel and then claiming that as surplus instead will be an equal solution.

We can avoid this by introducing a new pseudo-item that it costs all recipes to use, and minimizing that use of this value at the lowest priority. Let's call this the recipe tax.

C = 10xwater + 100xcrude oil + xtax
And our tableau with this new pseudo-item and cost function ends up looking like this:
tax s1 s2 s3 s4 s5 C answer
-4003010000-10000010
30-3030450000-100000
020405500000-100045
-30-300-50100000-1000
00-100-1000100000-100
11110010000000
00001010010000010
Notice that this "tax" has no surplus variable. We can be assured that it will balance out to zero. And, once again, the solution we obtain remains the same as above. (Along similar lines, we don't actually require surplus variables for the input resources, but it does no harm to include them.)

One possible additional feature is placing multiple input items at the same priority level. If we define the ratio described above as R, and the number of items at each priority level as N, then the coefficient for each priority level can be defined something like this:

Pn = Pn - 1 × R × Nn - 1
Where P1 is defined to be 1.