Balancing Multiplayer Games, Part 1: Definitions
Balancing a competitive multiplayer game is not for the faint of heart. In this article I’ll define the terms that will let us know what we’re talking about in the first place, then in the second and third articles, I’ll pretend that we have some hope of solving the wicked problem of game balance and I’ll explain techniques to do it. Then in the fourth article, I’ll try to impress upon you what deep trouble we’re really in.
First, the terms. Let’s start with balance and depth as defined by the Philosopher King of game balance:
A multiplayer game is balanced if a reasonably large number of options available to the player are viable--especially, but not limited to, during high-level play by expert players.
--Sirlin, December 2001
A multiplayer game is deep if it is still strategically interesting to play after expert players have studied and practiced it for years, decades, or centuries.
--Sirlin, January 2002
This definition of balance is pretty good, but there are two concepts hiding inside that term viable options. On one hand, I meant that the game doesn’t degenerate down to just one tactic, and on the other hand, I meant that if there are lots of characters to choose from in a fighting game or races to choose from in a real-time strategy game, many of those characters/races are reasonable to pick. Let’s call the first idea viable options and second idea fairness in starting options, or just fairness for short.
Viable Options: Lots of meaningful choices presented to the player. For depth’s sake, they are presented within a context that allows the player to use strategy to make those choices.
Fairness: Players of equal skill have an equal chance at winning even though they might start the game with different sets of options / moves / characters / resources / etc.
Viable Options
The requirement that we present many viable options to the player during gameplay is what Sid Meier meant when he said that a game is a series of interesting decisions (a multiplayer competitive game, at least).
If an expert player can consistently beat other experts by just doing one move or one tactic, we have to call that game imbalanced because there aren’t enough viable options. Such a game might have thousands of options, but we only care about the meaningful ones. If those thousands of options all accomplish the same thing, or nothing, or all lose to the dominant move mentioned above, then they are not meaningful options. They just get in the way and add the worst kind of complexity to the game: complexity that makes the game harder to learn yet no more interesting to play.
For the sake of depth, we also hope that the player has some basis to choose amongst these meaningful options. If the game at hand is a single round of rock, paper, scissors against a single opponent, there is nearly no basis to choose one option over the other so it’s hard to apply any kind of strategy. And yet a game of Street Fighter might be decided by a single moment when you choose to either block, throw, or Dragon Punch, or a game of Magic: the Gathering might be decided by a single decision to play a Counterspell or not. These examples at first glance look like the rock, paper, scissors example, but the decisions take place inside the context of a match that has many nuances where each player is dripping with cues about his future behavior. In Street Fighter and Magic, the player does have basis to choose one move over the other, and more than one choice is viable, we hope.
Also for depth, we prefer if the meaningful choices depend on the opponent’s actions. Imagine a modified game of StarCraft where no players are allowed to attack each other. All they can do is build their base for 5 minutes, then we calculate a score based on what they built. There are many decisions to make in this game, and it might have several paths to victory, but because these decisions are purely about optimization--more like solving a puzzle than playing a game--they make for a shallow competitive game. Fortunately, in the actual game of StarCraft, you do need to consider what your opponent is building when you decide what to build.
While we require many viable options to call a game balanced, the requirement about giving the player a context to make those decisions strategically and the requirement that the decisions have something to do with the opponent’s actions are really about depth. They’re worth pointing out though because we should attempt to increase the depth of the game as we balance it, not decrease it.
Fairness
Fairness, in the context I’m using it here, refers to each player having an equal chance of winning even though they might start the game with different options. In Street Fighter, each character has different moves, in StarCraft each race has different units, and in World of Warcraft, each arena team has different classes, talent builds, and gear. Somehow, all of these very different sets of options must be fair against each other.
I want to stress that I am only talking about options that you’re locked into as the game starts. That’s a very important distinction. Options that open up after a game starts do not necessarily have to be fair against each other at all. Imagine a first-person shooter with 8 weapons that spawn in various locations around the map. Two of these weapons are the best overall, 3 are ok but not as good as the best weapons, and the remaining 3 are generally terrible but happen to be extremely powerful against one or the other of the 2 best weapons.
Is this theoretical game balanced? It certainly might be, meaning that nothing said so far would disqualify it. A designer could decide that he wants all weapons to be of equal power, but he need not decide that as long as each weapon is still a viable choice in the right situation. It might be fine to have two powerful weapons that players compete over, a few medium power weapons that are still ok, and some weak weapons that allow players to specifically counter the strong weapons. There could be a lot of strategy in deciding which parts of the map to try to control (in order to access specific weapons) and when to switch weapons depending on what your opponents are doing.
By contrast, a fighting game with 8 characters designed by that scheme is not balanced because it fails the fairness test. Players choose fighting game characters before the game starts, but they pick up weapons in the first-person shooter example during gameplay. Being locked into a character that has a huge disadvantage against the opponent’s character is unfair.
Games that let players start with different sets of options are inherently harder to balance because they must make those sets of options fair against each other in addition to offering the players many viable options during gameplay.
Symmetric vs. Asymmetric Games
Let us call symmetric games the types of games where all players start with the same sets of options. We’ll call asymmetric games the types of games where players start the game with different sets of options. Think of these terms as a spectrum, rather than merely two buckets.
Symmetric Asymmetric
<------------------------------------->
Same starting options Diverse Starting options
On the left side of the spectrum, we have games like Chess. In Chess, each side starts with exactly the same 16 pieces. The only difference between the two sides is that white moves first. Because of this different starting condition, we shouldn’t say that Chess is 100% symmetric, but it’s damn close. If Chess were the only game you had ever seen, you might think that the black and white sides are played radically differently; white sets the tempo while black reacts. There are entire books written about how to play just the black side. And yet if we zoom out to look at the many games in the world, we see that the two sides of Chess are so similar as to be virtually indistinguishable when compared to two races in Starcraft, two characters in Street Fighter, or two decks in Magic: The Gathering.
The more diversity in starting conditions the game allows, the farther to the right of our spectrum it belongs. So asymmetry, as we mean it here, is a measure of a game’s diversity in starting conditions. This is not meant to be an exact science, so there is no specific formula to determine where a game belongs on this spectrum, but it’s a handy concept anyway.
Let’s look at a few examples. StarCraft has three very diverse races so it belongs toward the right side of our spectrum. That said, even if the three races were as different as imaginable from each other, the number three is small enough that we shouldn’t put it at the far right (admittedly, this is a judgment call). Fighting games can have dozens of characters that play completely differently and they tend to have more asymmetry than most other types of competitive multiplayer games.
That said, individual fighting games can vary quite a bit in just how asymmetric they are. Virtua Fighter, for example, is an excellent and deep fighting game, but the diversity of characters is relatively low compared to other fighting games. All characters have a similar template compared to Street Fighter where some characters have projectiles, or arms that reach across the entire screen, or the ability to fly around the playfield. Meanwhile, Guilty Gear, a fighting game you’ve probably never heard of, has more diversity than any other game in the genre that I know of. One character can create complex formations of pool balls that he bounces against each other, another controls two characters at once, another has a limited number of coins (projectiles) that power up one of his other moves and a strange floating mist that can make that powered up move unblockable. It’s almost as if each character came from a different game entirely, yet somehow they can compete fairly against each other. Guilty Gear is possibly all the way to the right of our chart because it has both wildly different starting options (characters) and many of them (over 20!).
Magic: The Gathering is also extremely asymmetric in the format called constructed where players bring pre-made decks to a tournament. The variety of possible decks is staggering and tournaments usually have several different decks of roughly equal power level, even though they play radically differently.
First-person shooters tend to be very far toward the symmetric side of the spectrum, usually offering the same options to everyone at the start, except for spawning location. Remember that picking up different weapons during gameplay, or even changing classes during gameplay in Team Fortress 2, does not count as asymmetric for our purposes. (Again, because those different options don’t need to be exactly fair against each other.) Also, first-person shooters that do have asymmetric goals for each side often make the sides switch and play another round with roles reversed so that the overall match is symmetric.
Now that we’ve mapped out where some games fit on our spectrum, remember that this is not a measure of game quality. If your favorite games appear on the left (symmetric) side, that does not mean they are bad. If you like StarCraft more than Guilty Gear, you do not need to be upset that Guilty Gear is “more asymmetric.” The spectrum is simply meant to give us an idea about how different the starting options of a game are, not about the depth or fun of the game.
No matter where a game appears on this spectrum, it still needs offer many viable options during gameplay to be balanced. In addition to this, the farther a game is to the right of the spectrum, the more it needs to care about balancing the fairness of the different starting options. In the next part of this series, I’ll talk about how we can design games that make sure to offer enough viable options and in the article after that, I’ll explain how we can attempt to create fairness in those pesky asymmetric games.
Reader Comments (16)
Sirlin, I'm sure you're familiar with it, but other readers might be interested in the Complexity vs Depth article on the Game Design the Wrong Way blog. It uses a more formal sort of definition for depth than you use, although both definitions are compatible with each other.
I'm sorry Sirlin, but there's one glaring error in this article.
The Battleship in Monopoly breaks the game.
What? I can't believe how ignorant of Monopoly you are, Shoe is totally top tier! The only reason it isn't picked every time is because it's hard countered by Wheelbarrow, but no one picks Wheelbarrow unless they KNOW you're picking Shoe, it's just too much of a risk.
Just wanted to say the most asymmetric game I have ever played that is actualy fair and balanced is Defense of the Anchents, a WC3 mod. It has 93 possible starting characters to choose from, each of these is completly unique in its style of combat. The three basic class types (strength, intelegence and agility) form a rough paper/rock/scissors in that usualy int>agi>str>int. But there are pleanty of character that run counter to this. It is so even that I usualy play an all random game, where everyone is assigned a random character leading to 48398230717929318249 possible starting combinations (10 players), and yet still I never feel like we cant win because of what our team or there team has.
I think the asymmetry of DotA would have somewhat more to do with the different types of abilities (direct damage, snaring, movement, etc. [I don't play dota]) rather than simply the number of meaningful characters to choose from.
the depth in dota gameplay has somewhat to do with RPS, but more along the lines of team synergy and the roles each type play. while obsid ascribes the RPS system in dota to main character attributes, it would be more accurate to say that characters in dota are better classified as early, mid and late game characters. The specific team compositions, positioning, and opponent matchups (in each lane) determine what happens in the gameplay, as well as the different item/skill builds players choose.
id imagine any game will increase in 'depth' naturally with time providing it remains popular enough to be of interest.
While I agree that a game needs to have enough viable options, I think it is important to note that having some options that are 'traps' can be very useful. In Magic: the Gathering, for example, the designers intentionally have 'bad' cards, or cards that are sub-optimal.
There are quite a few articles from the designers themselves, but the main point they make is that it helps teach new players. As long as the 'trap' option is designed well the player can learn why a certain strategy or type of card doesn't work as well as they thought without becoming overly frustrated. This is why not every single card printed is viable in constructed (or even limited*). Do you think 'trap' options are a good idea in other kinds of games, or even in Magic: the Gathering?
* limited is a format where players open packs of cards and end up building a deck based on what they open. Normally some cards are better in limited then in constructed because of deck size, consistency, and card availability, among other things.
Actually Mark Rosewater's statements that they have bad cards on purpose so you can learn that they are bad is such an absurdly poor argument that it's been the butt of jokes for YEARS on this site. I think it's a terrible idea. Imagine adding fighting game moves that suck so you can learn not to use them. Or RTS units that suck so you learn not to to use them. Or MTG cards that suck so you learn not to use them. Ridiculously stupid in all cases.
Normally I would say Rosewater's argument is actually intellectually dishonest, because clearly the reason to have bad cards is to enable their despicable business model where they need tons of trash cards in order to make whole "one rare per pack of random cards work," but in this case I don't think it's intellectually dishonest. I think he's so deep into that particular bad concept of design that he actually believes it.
Anyway, no, please do not put intentionally bad moves in your game. It makes your game bloated and inelegant, just as described in my article above.
Regarding MTG: do you think that a certain amount of intentionally bad cards always harms a trading card type game? Hypothetically, let's imagine that the cards were free and that all cards were earned through gameplay, kind of like a levelling system. Then over time, you cycled out the bad ones from your deck for better ones. Perhaps I even send those bad cards into an AI-controlled deck to help me in 2v2 battles, just to give them a more interesting use than simply selling of trading them away.
Now the big difference I think here is that, in MTG, the tiers (common, uncommon, rare) are not terribly well balanced within a given tier. Could balancing those individual tiers make the game better? Even if you did balance those tiers, the resource costs (mana costs in MTG) between tiers are still unbalanced. To offset that, you give decks a level or rating based on the total tier value of the cards contained in it, in much the way as an army has a point value in Warhammer 40k. Similar deck values should be relatively balanced, at least much more so than in MTG.
Perhaps this strays a lot from designing bad cards as a learning tool point. Honestly, I've not read much on Rosewater's reasoning or lack thereof. But I have run into these ideas in my own designs and I'm looking to bounce ideas around. Thanks
The short answer is I think intentionally bad cards are fundamentally a bad idea in a CCG, and that they really only exist for business purposes to indirectly charge you more money. They clutter up design, and that design-based defenses of them are either disingenuous or simply invalid. They are a terrible teaching tool, and if anyone really cared about teaching something, intentionally bad cards would certainly not be the method they'd chose.
In a fixed deck game like Yomi, it is ok to have some cards intentionally kind of bad though. That's just the built-in weakness of a deck, similar to the concept that Dhalsim in Street Fighter doesn't have a dragon punch, that's one of his built-in weaknesses. Similarly, a CCG card could be somewhat bad yet still justified if it's because it's in a certain color, or something. For example, maybe the color red has advantages like direct damage, and has the drawback of weaker creatures. A red creature might be "bad" compared to a green one, but not bad in the overall scheme of a red deck's ability to win. If you include a red creature that is obviously intentionally bad compared to other red creatures though, that's cluttering up the design, exists only as cruft to inflate the game's price indirectly, and all the other usual bad stuff.
You gave an example where you start with bad cards and then build up to better cards over time, using the old bad cards in some alternate way in a separate deck. If you meant that as a 1p mode, sure that is maybe possibly ok. Kinda weird, but could work if you executed it well, maybe. If you meant that as a competitive 2-player thing against other humans, that would go against what I think is an important concept of any competitive game: the time you spend CAN (and SHOULD) result in your own personal skill increasing, but the time you spend SHOULD NOT give material advantages. It's no longer a "real competitive game" if it does. The level playing field (of material advantage, not of skill) is a basic requirement for me to take a competitive game seriously.
Thanks for the prompt reply. Now that I've had a chance to regather my thoughts, I guess what I'm really wondering is if there's any merit to the card and deck value system I proposed. As an arbitrary example for three levels of cards: level 1 = 10 pts, level 2 = 20 pts, level 3 = 30 pts. Assuming each level was itself balanced, if deck values represent the total card values, then shouldn't two players with same deck value be balanced?
In this case then, you should only be competing against people on equal footing, and that have upgraded from the same number of level 1 (bad) cards. Is this not the case, or is any levelling system too plastic for serious strategy?
Thanks
If you mean like, all players get 500 points to allocate, then yeah everyone would be on equal footing. Including some bad cards becomes an interesting built-in weakness of your deck, and that sounds good. If instead you mean that over time your max points to allocate goes up and up, and at any given moment you play only vs people with the same number of max points...not as great. Not too bad because it's a level playing field, but matchmaking for this game sounds terrible. It's hard enough finding opponents when we're all playing the same game, but now the game is divided into many, many sub-games, like with 200 points, 210 points, 220 points, etc.
Even apart from the matchmaking problem, it's unfortunate that we are all playing such a different game from a competitive standpoint. I mean it's not TOO bad, but it's kind of nice to have a single standard like chess or starcraft or street fighter or whatever, where beginners and masters all access the same in-game resources as each other. Your idea does build in the idea of progression, and people like that, so it could be successful. In fact, it probably would be successful. I'm just kind of wary of it for the two reasons above.
That's really good advice Sirlin. Yeah, matchmaking will be hellish. We'd likely have to have kind of range, so maybe a reward system would compensate if you beat more valuable decks. I am starting to see the wishy-washiness of trying this kind of varied tier system versus the elegance of a single-tiered game like most competitive games. I'll have to approach my team and come to some sort of consensus on whether the added player progression is worth the cost to fairness.
By the way, your site is stellar. I've barely scratched the surface. I really love your books section; I've a busy summer ahead.
I'm looking for a simple, 2d tactics game for PC with AI opponents. The wargames that I've found are too large and too complicated for learning the essence of strategy and tactics. Is there one that you would recommend as a learning tool?
Thanks again.
A random third party perspective:
Having played Battletech (but not Warhammer 40K), the idea of "your army/deck/team/whatever's value is the sum of the values of its parts" seems inherently flawed.
I don't know if you're familiar with Battletech's BV system, but it leads to a ton of min/maxing and can also lead to silly ratings (for instance, a mech with a ton of guns but not enough heat sinks to shoot any of them without melting down, isn't heavily downgraded in the BV system when it's actually terrible in combat).
The other problem is that summing things in general does not really reflect the reality that more forces grows exponentially in power. Like 2 Mad Cats is WAY more powerful than 1 Mad Cat twice because they can flank, fire at the same time, cover each other etc. Battletech kind of patched that with a "Force multiplier" but it's too simplistic. 1 Mad Cat gets penalized the same amount if you add a Mad Cat to it as if you add a crappy scout mech.
If anything, Battletech-wise, I think mechs should be multiplied by each other instead of added to each other.
Not sure how/if any of this would apply to a CCG, but I can imagine some very similar issues occurring. For instance if it were MTG and you had a max number of points in your deck, I would think spending all your points on a couple of rares that combo well together and a couple of cards to get that combo and then a giant stack of trash would be much, much more point efficient than having a consistently mediocre deck, half of whose cards you never see in a game.
I've not tried the Battletechs system, so I'm also speaking generally. In my design, what I'm thinking is that there are three tiers; all unit\cards in each tier must be completely balanced, almost as if you're designing three separate games.
Within a single tier, some units may very well be min-maxed, but that needs to be accounted for in the balance. Beyond simple looking at the numbers, units really need to be rigorously play-tested to see if they really are balanced. I see that as a problem to be addressed within a given tier, and not cross-tier.
I think that same logic would apply with balancing tiers. It's a matter of figuring out HOW much better tier 2 is than tier 1. That should give you SOME idea of how to assign values to tiers. To address the min-max of Rares+Garbage cards, I think the multiplier is definitely the way to go. For example, if the multiplier is say 10X between tiers [T1 = 1, T2 = 10, T3 = 100], then your deck ratings would be sufficiently spread out to limit that deck-building tactic.
Let's say a size is 50 and in Deck 1 you want to have some T3 and mostly T1, avoiding T2 altogether.
Deck 1 has: 10x T3 = 1,000 and 40x T1 = 40, for a total of 1,040.
Composition is: 20% T3 and 80% T1.
Deck 2 uses T3 for flavour, but is mostly composed of T2.
My deck is: 45x T2 = 450; 5x T3 = 500; for a total of 950 points.
Composition is: 90% T2 and 10% T2.
Personally, I'd much rather be drawing T2 90% of the time than T1 80% of the time. Maybe if you get that super T3 combo, you'll win, but that deck simply won't win very often.
Now I took these numbers out of thin air, 10x is probably really far off the mark. But as long as the tiers are balanced, there is be an optimal multiplier, however difficult it is to find. I think what's most telling about this thought experiment is just how difficult it would be to fairly implement a system like this. The deck ranges could be absolutely huge, and matchmaking would suffer. I think it could work, and the sense of progression would be awesome. It may be out of our skill range to accomplish it though. Time shall tell.