Sunday
Nov092008

Street Fighter HD Remix Design Overview

Super Street Fighter 2 HD Remix was originally going to be a graphical update of Super Street Fighter 2 Turbo, but along the way some magic happened. HD Remix is now a completely new Street Fighter game—the 6th installment in the SF2 series. You get the classic gameplay of Super Turbo and the new HD Remix game in the same package, now available on both Xbox Live Arcade and PlayStation Network.

Guile's new Flash Kick is pretty crazy in Street Fighter HD Remix.

I'd like to explain how the gameplay changed SF HD Remix and the reasoning behind all the changes, but first I'll introduce myself. I'm David Sirlin, and I oversaw all the gameplay in HD Remix, so I deserve both your praise and criticism for all game balance issues. I've played Street Fighter since Street Fighter 1. I've competed in Street Fighter tournaments for 16 years and for 11 years I've helped organize and run the tournament series that started out as B3 and has now become the international Evolution Championships. I represented the United States in Super Street Fighter 2 Turbo in Japan's Super Battle Opera tournament and I narrated much of Bang the Machine, a documentary film about the Street Fighter community. For years, I've been a care-taker of the franchise, helping to present the games in the best way in Capcom Classics Collection 1, 2, and Remixed.

And then I had the honor and burden of improving upon what I consider the very best Street Fighter game ever: Super Turbo. Many people said it's impossible to improve upon the polished gem of ST and there were lots of obstacles to even getting this new gameplay in the game. Dozens and dozens of times people told me I couldn't do it, wasn't allowed to do it, and other discouraging things. Wayne Gretzky said, "You miss 100% of the shots you don't take," so I took my shot.

Here were the design goals:

1) Make the game easier to play—more inclusive rather than exclusive
2) Make the game even more balanced for tournament play
3) Add fun as long as it doesn't interfere with #2.

Easier Controls

Inside Street Fighter, there is a wonderful battle of wits, but many potential players are locked out of experiencing it because they can't dragon punch or do Fei Long's flying kicks, or whatever other joystick gymnastics. I'm reversing the trend. There's only so far I can go with this and still call it SF2, but wherever I could, I turned the knob towards easy execution of moves. Let's emphasize good decision making—the true core of competitive games—and get rid of artificially difficult commands.

This will get more players interested in the game, eventually leading to more competition. It will also get players past the awkward beginner phase faster and into the intermediate phase where the interesting strategy starts to emerge.

There are some players who wrongly believe that this "dumbs the game down." Actually, the opposite is true. Experts can perform special moves already, so the changes listed below have very little effect on them. Experts will care about actual balance changes such as hitboxes, recovery times, new properties for some moves, and so on. Making special moves easier, however, just allows everyone else to play the "real" game without needing to develop hundreds of hours of muscle memory just to perform the moves. It's actually sad to hear that some players think that their ability to execute a 360 command throw is why they are good, as opposed to the actual strategy of getting close enough to the opponent with Zangief to land the throw.

Another wrong-headed comment I often get is that easier controls don't leave enough skills in the game to separate good and bad players. The statement is absurd. Easier special moves don't change the strategic depth of the game at all (and the actual balance changes in HD Remix hopefully increase the strategic depth). Furthermore, there's no shortage of nuance for experts. Does Cammy's dragon punch beat Fei Longs? It depends on exactly who did it first, which means that 1/60th of a second timing is just as important as ever. So is positioning, spacing, the difficulty of performing combos, and the skill of reading the mind of the opponent.

Easier Moves Overview

• Dragon punch timing is more forgiving
• 360s throws have alternate motions
• Tiger knee motions have been removed
• Mash moves are easier
• All 3-button moves changed to 2-buttons

All dragon punches are easier because the timing window to perform them is no longer random—you now always get a 15 frame window between each joystick motion rather than a random number between 8 and 15 (and you only had a small chance of getting 15 in the original game). 360 motions are easier because they no longer require you to hold up, leading to accidental jumps. Spinning Pile Drives can now be done by half-circle forward, then back + punch or half circle back, then forward + punch. There is a lot of leeway on these commands so that they can still be done from defensive crouch, and the old 360 commands still work too.

Most commands ending with diagonally up/forward have been changed to much easier motions. Sagat's Tiger Knee is a dragon punch motion now (as it is in later games). Cammy's Hooligan Throw and Fei Long's Flying Kicks are now fireball motions (qcf + p and qcf + k, respectively), so no more accidental jumping frustrations.

The "mash moves" require less mashing. That means it takes fewer button presses to activate Chun Li's Lighting Legs, Honda's Hundred Hand Slap, and Blanka's Electricity.

All moves that required three simultaneous button presses now only require two. This is specifically to make the moves easier to execute on a gamepad (as opposed to an arcade joystick). Because of the way you hold a gamepad, it's easier to hit the jab + short buttons together with your thumb than it is to hit the jab+strong punch buttons. For this reason, there are a lot of jab+short commands now. Zangief's kick lariat, Vega's single defensive flip, Blanka's hop, and T.Hawk's aerial dive can all be done with jab+short as well as the original three button commands. Zangief's punch lariat and Vega's double defensive flip can be done with either strong+forward, fierce+roundhouse, or the original commands. Dhalsim and Akuma's teleports only require two punch or two kick buttons now, as does Balrog's turn punch (but don't worry, you can't charge turn punch while having access to fierce and roundhouse at the same time).

All of this taken together means that it's easier than ever to get your moves to come out, especially on a gamepad. These changes alone increase the fun factor of the game quite a bit, especially for T.Hawk, Cammy, and Fei Long because their moves were so hard to do before.

Balanced for Tournament Play

Super Turbo is a delicate ecosystem, so changing anything can affect game balance a lot. Because there's so much potential to wreck things, I needed a plan that leverages all the knowledge I have about high-level play over the last 14 years. I picture a flat piece of wood with 100 indentations on it and 100 marbles. If we have 90 of the marbles resting in the right indentations, we wouldn't want to violently shake the whole thing around in hopes of fixing the last 10.

After over a decade of tournaments, we know which characters are the best (Balrog and Dhalsim for sure, and Old Sagat in the US and Vega in Japan, with Chun Li as an honorable mention). We know which characters are the worst (Cammy, Fei Long, T.Hawk, Zangief, and Blanka). And which are in the middle. My goal was to buff up the worst characters so they reach the middle (or upper middle at best). Next, buff the middle characters slightly, but not so much that they become top tier. And finally, leave the top tier characters intact. In other words, the idea is to compress the tiers so that the difference in power between the best characters and worst characters is much smaller than before.

This approach gives me some margin of error. I tried to make the previously weak characters about 2nd tier, knowing that it's very possible for them to end up better than expected. If they end up top tier, that's fine, but if I tried to make them top and they ended up even above that, it would be a major problem. Even if the weak characters end up 2nd tier or slightly below, they'll still be much more able to win than before, and that's good news.

Keeping the top tier at about the same power level is a good idea for a few reasons. First, I have a very solid idea of how powerful a character needs to be to be top tier (same as always!). Next, to use my last analogy, rolling around fewer marbles is better, so it's safer to leave the top tier than it would be to bring them down in power and have no idea who's good anymore. Also, as I said when I rebalanced Puzzle Fighter, we already know what the game felt like with the previous top tier characters, and it was fun, so it's better to balance the game around that power level than a new, lower power level. And finally, to restate that, there are so many games that try to fix *everything* and nerf everything to such a low power level that even though things might be "fair," they are no longer fun. I call this the Marvel vs. Street Fighter syndrome.

That said, there are some nerfs to the top tier. It sounds like I just contradicted myself, so I want you to understand this important distinction. Imagine that a top tier character has 10 awesome things about him or 10 ways to win. If I really wanted to nerf his power level, I would make all 10 of these things, say, 20% worse. But what if one of those 10 things is so abusable that it can be repeated over and over pretty mindlessly, leading to shallow gameplay? This is a case where I think I can remove or tone down that 1 option and leave the other 9 just as strong as ever. This does not even necessarily reduce the overall power level of the character—it just forces the player out of repeating loops and into other more interesting options.

There are several of these situations in Super Turbo, and rather than trying to muck with every possible one, I think it's just safer to remove the repeatable abuse from the top tier characters only—the abusable stuff that can often decide matches.

It's ironic that as a player, I seek out exactly these kinds of repeatable, mindless moves, yet as a designer they are what I tried to remove. The list of toned down things is very, very short in comparison to the list of new, powered up stuff, so I think that fun factor is going up in addition to the compressing the tiers for balance.

I hope that you find Street Fighter HD Remix easier to play than Super Turbo, with more strategic depth, and with fewer lopsided matches than ever.

--Sirlin

Friday
Oct172008

Balancing Multiplayer Games, Part 4: Intuition

We’re in much deeper trouble than I’ve been letting on when it comes to balancing games. The problem is that you cannot solve your game--you definitely cannot--yet you must somehow balance it.

By solving, I mean you cannot determine how to play your game optimally. If you could determine this, there would be no actual strategy left in your game, so it would be boring and not worth talking about in the first place. If you can solve your game, your players can definitely solve it. If you can’t solve your game, your players might still solve it. In any case, we know you can’t solve it because that means you did a bad job designing it in the first place.

How in the world can you balance something when it’s impossible to know the best ways of playing it? If you aren’t worried about this, then you don’t understand how wicked the problem is. The techniques I discussed in the previous three articles will help, but they remind me of what art director Larry Ahern said when he was preparing to draw all the backgrounds in The Curse of Monkey Island. He said that by following the rules of composition from classical painting, he believed he could get results that were "not terrible." But, he said, going from not terrible to great was something he hoped he had within him, and that it's not exactly possible to get there by following someone else's cookbook of rules.

Picking The Top Players

Let’s back up to an easier problem. Imagine I gave you a room full of players of a certain game and I asked you to determine who the best player is, and who is second best. How would you do it? Answer: you would have them all play each other.

What if I don’t let anyone play the game, though? I’ll let you interview the players or have them submit written answers to your questions about how they will play the game and what they know about the game. Can you determine the best players from this method? I bet you will do only slightly better than monkeys throwing darts to determine the answer. In all my experience running and competing in tournaments, I can say with some authority that there is little correlation between ability to win and ability to explain yourself.

Why are the best players not necessarily able to reveal themselves as best through interviews or speaking? I claim there are two reasons:

1) Spoken and written answers have extremely narrow bandwidth.
2) It’s impossible to access many of our own skills with conscious thought.

Both of these ideas have to do with the concept of the mental iceberg.

The Mental Iceberg

Imagine an iceberg that represents your total knowledge, skill, and ability at something, for example in playing a certain competitive game. The small part of the iceberg above the waterline is what you have direct conscious access to; it’s what you can explain. The gigantic underbelly of the iceberg is the part you do not have direct access to, and yet it accounts for far more of your overall skill than the exposed tip. When we interview players or ask them for written answers about how they might play, we are only accessing the tip. If one player’s iceberg has a larger tip (he tells a better story about how he will win), it’s entirely possible that his hidden below-water iceberg is much smaller than another player’s, and that’s really what matters.

The tippy-top part represents what you can explain or consciously understand. The huge underside represents your vast unconscious.

Narrow bandwidth

The amount of information you can convey in a written or spoken answer is actually very small compared the storehouse of knowledge and decisions rules you have stored in your head. Also, spoken and written language encourage linear thinking, while your actual decision-making might be a more complex weighting of many different interconnected factors. In a written answer, a player might say “move A beats move B, so I will concentrate on using move A in this match.” But really it might depend on many factors: the timing of move A, the distancing, the relative hit points of the characters, the mental state of the opponent, and so on. Players cannot communicate these nuances in an explanation the way they can enact them during actual gameplay.

No Direct Access to Parts of Our Own Minds

This concept might be hard to swallow at first, but it should be incredibly obvious if you think about it for a moment. You are not conscious of how your digestive system works. You do not have direct access to how your cells make and break the bonds of ATP and ADP to give your body energy. When you see a frisbee travel across the sky, you are not aware that your eye moves in a particular pattern of jerky movement that’s common in all humans (you believe that you smoothly follow the moving object).

One study estimates that the human brain takes in about 11,000,000 pieces of information per second through the five senses, yet the most liberal estimates say that we can fit at most 40 pieces of information in conscious memory. There is A LOT going on behind the scenes, and we do not have conscious access to it, even though we are still able to make decisions that leverage all that information. (Wilson, p.24.)

Blindsight

This is what blindsight looks like.The medical condition of blindsight is a particularly telling example. Blindsight is blindness that results from having damage to a certain part of your visual cortex. There are actually two different neural pathways for vision, and people with blindsight have only one of these pathways blocked. The result is that they are blind, meaning specifically that they don’t consciously experience seeing. Even though they claim to see black, they can still make decisions based on eyesight. In one experiment with a blindsight subject named DB, experimenters showed him a circle with either vertical or horizontal black and white stripes. Even though he can’t see so he has no idea whehter the stripes are horizontal or vertical, and sometimes become agitated when asked to guess, his “guesses” were correct between 90 and 95 percent of the time. In other words, people with blindsight can perceive the world more accurately than their conscious minds can explain. (Blackthorne, p.263.)

Instant Decisions

Another clue to this concept lies in decisions that we make extremely quickly. Consciousness does not coalesce instantly; it takes somewhere between 0.3 to 0.5 seconds to form. I know that that sentence is highly controversial amongst brain researchers, but I think it’s generally safe to say in times shorter than that, we have not yet formed enough of an awareness about what’s happening to be conscious of it. And yet, experiments show that we make decisions based on outside stimulus faster than this. For example, when people are asked to grab wooden rods as they light up a certain color, and the experimenter cleverly lights up one rod, then as you are reaching for it, darkens that rod and lights up a different one, he can measure when your hand made the course correction to go for the newly-lit rod. The course correction occurs almost immediately, much faster than 0.3 seconds, and yet the subjects believe they course correct only at the last moment, after 0.5 seconds. In fact, they don't even consciously percieve that the lights on the rods changed until after they made the course correction! They are making decisions before they are conscious of what is going on.

Tennis is more real-world example of this. Tennis pros can serve the ball at 130mph, and the distance between baselines is 78 feet. That means it takes 0.41 seconds for the ball to reach the opponent. New York Times writer David Foster Wallace said:

The upshot is that pro tennis involves intervals of time too brief for deliberate action. Temporally, we’re more in the operative range of reflexes, purely physical reactions that bypass conscious thought. And yet an effective return of serve depends on a large set of decisions and physical adjustments that are a whole lot more involved and intentional than blinking, jumping when startled, etc. (New York Times.)

Tennis pro Roger Federer has explained in interviews that he doesn’t like to be called a genius at the game, because he doesn’t think during the incredible moments when he returns balls few other players can. He acts before he is conscious of the situation by leveraging his unconscious skills.

Heuristics We Use But Can't Explain

Baseball gives us another important example. How do fielders catch fly balls? It seems like a very complex math problem with variables for speed, trajectory, gravity, friction from air resistance, wind influence, etc. Should fielders run as quickly as they can to the general location where the ball will land, then make adjustments as they solve these equations somehow?

No. The best way to catch a fly ball is to use the gaze heuristic, as described in the book Gut Instincts. The method is to look at the ball, start running, and adjust your running speed so that the angle of your gaze remains constant. You will then reach the ball just as it lands, and you’ll be in the right place. Experimenters found that the best professional baseball players use this method (and so do dogs), but that most of the players don’t know that they use it, and are unable to explain any method they use to catch fly balls. (Gigerenzer, p.10.)

This example shows that it’s very possible for the correct answer to be hidden in your mental iceberg’s underbelly, but it’s not necessarily a representative example. I chose it on purpose because the underlying decision process can be simply described, which allows me to describe it to you. But what if they underlying decision process relies on a complex weighting of variables that isn’t easy to describe? This is another clue that explanations of how to solve complex problems are just tips of the iceberg, and not necessarily accurate.

Before we get back to solving our near-impossible task of balancing a game that we can’t possibly know how to play optimally, I’d like us to look at two cases outside of games where experts solved very difficult problems. The different methods they used are extremely applicable our problem, and to solving any other highly complex problem.

The Case of the Greek Statue

From an example in the book Blink, the Getty Museum of California was considering purchasing a 2,600-year-old Greek statue for almost $10 million. To determine if it was a fake, the museum had its lawyers investigate the paper trail of the statue’s ownership and whereabouts over the last several decades and had a geologist named Margolis

...analyze the material composition of the statue. The geologist extracted a 1cm by 2cm sample from the statue analyzed it using an electron microscope, electron microprobe, mass spectrometry, X-ray diffraction, and X-ray fluorescence. The statue was made of dolomite marble from the island of Thasos, Margolis concluded, and the surface of the statue was covered by a thin layer of calcite -- which was significant, Margolis told the Getty, because dolomite can turn into calcite only over the course of hundreds, if not thousands, of years.

The statue in question.After 14 months of analysis by the lawyers and scientists, the Getty was ready to buy the statue. And then the trouble started. When the Getty was nearing the unveiling of the statue, a few art experts saw it and each of them had an immediate reaction that something was wrong. They didn’t know what, but they thought it was a fake.

One of those experts was Thomas Hoving, the former director of the Metropolitan Museum of Art in New York. When Hoving saw the statue, the first word that popped into his head was “fresh,” which he thought was an odd word to describe a statue that’s thousands of years old.

The Getty was worried, so they shipped the statue to Athens where they invited art experts to a symposium to look at the statue. Most of them said it was fake, too. “It’s the fingernails” or “it’s the hands” or “statues don’t come out of the ground looking quite like that,” people said. They didn’t know just why, but they knew. Then the Getty’s lawyers discovered that some of the statue’s ownership documents had been faked. The geologist (who was so proud of his examination that he wrote an article about it in Scientific American) discovered that it was possible to convert dolomite to calcite in just a few months using potato mold.

How did Thomas Hoving know something instantly that the lawyers and scientists could not discover after 14 months of investigation? He knew because he had an enormous mental iceberg of knowledge and expertise in this exact area. He dug up statues himself in Sicily. He also said:

“In my second year working at the Met, I had the good luck of having this European curator come over and go through virtually everything with me. We spent evening after evening taking things out of cases and putting them on the table. We were down in the storerooms. There were thousands of things. I mean, we were there every night until ten o’clock, and it wasn’t just a routine glance. It was really poring and poring and poring over things.”

The lawyers and scientists had a large “iceberg tip” in this case. They had lots of explanations why the statues were real. But if there’s anyone in the world who has a tiny iceberg underbelly when it comes to knowledge about Greek statues, it’s laywers and scientists. Hoving’s iceberg tip was small (he just had a feeling), yet his iceberg underside was enormous. (Gladwell, pgs.3-11,184.)

Thomas Hoving knows statues.Can you imagine trying to detect fake statues by asking Hoving to give you a theory of fake statue detection? What if his theory left out lots of things he unknowingly uses to solve the problem? What if his theory is actually wrong, and doesn’t reflect his methods at all (because he doesn’t know them himself)? That’s the inherent problem with requiring that any expert synthesize a theory of his own expertise.

The way to solve a complex problem is to develop an enormous iceberg underside. Is that just saying that you need “experience,” though? Experience is kind of a dirty word to me. George Bush has “experience” as a president and with foreign affairs. Do you want him running the country (or anything else)? Meanwhile Lincoln had hardly any experience. Experience is only as good as the person who has it, and even besides that, we usually use completely the wrong scale to measure experience.

If you have the experience of “shipping 10 games,” for example, that’s great but it doesn’t have anything to do with the particular type of experience--the particular iceberg of knowledge--that is involved with balancing a complex asymmetric game. Before we try to develop this iceberg, let’s look at one more example.

The Case of the Space Shuttle Disaster

Imagine that your problem is that you must determine how and why the Space Shuttle Challenger crashed. You are on the investigative committee, looking for this answer. What is the best way to solve this problem? Is it to be an extreme expert in aerospace engineering? Is it to be an expert at investigating disasters? The answer is to first live the life of Richard Feynman, then solve the problem.

Feynman is one of the most brilliant people who ever lived, and he demonstrated his ability to solve complex problems in many fields outside of his own field of physics. Feynman was not an engineer, but the shuttle problem required an engineering analysis. He was a fish out of water on the investigative committee, and had no experience doing anything like that. What he did have experience doing was analyzing problems in general. Taking in vast amounts of information, organizing it in his head, figuring out what mattered and what didn’t.

Feynman dramatically reveals why the Challenger crashed with a glass of ice water and rubber.Feynman took nothing at face value, ignored the rules of the committee, questioned everyone he could, ignored authority figures and the politics of the investigation, and instead focused getting real information. His very first interview was a marathon session with the engineers who designed the shuttle’s rockets. They had the iceberg of knowledge he needed and he knew how to get at least a piece of it out of their heads and into his own. By ignoring illusions like who had important titles or who supposedly knew anything and instead focusing on who actually had the relevant icebergs of knoweldge, Feynman--and no one else on the commission--discovered that the real problem with the shuttle was lack of resilience of the rubber O-rings during cold weather. (Feynman, pgs.113-153.)

If you are going to solve a complex problem, the two best ways are to be like Hoving or to be like Feynman. When I worked on Street Fighter, I was like Hoving. I have a mountain of knowledge about that particular game, so my expert opinion, even if it expresses itself as just a feeling, is worth a lot. On other games I worked on such as Kongai and Yomi, I was more like Feynman. I know how to solve balance problems in general, but there are playtesters who have bigger icebergs of knowledge about how to play the game at an expert level than I do, so I question them, watch them, and rely on them.

Developing the Iceberg

How do you actually get the iceberg of knowledge in the realm of balancing competitive games? Ideally, you want the Feynman-type of ability that can be applied to many types of problems, not just a very narrow domain of one game.

The problem with developing this type of knowledge is time. If we are instead trying to become expert players (rather than expert game balancers), we have access to a very fast feedback loop. Play the game against people better than us, see what worked and didn’t, adjust, play again. A game of Street Fighter takes only a couple minutes, and even an RTS game takes less than 1 hour. But creating a game and seeing how its balance turns out takes years. It’s a very slow feedback loop, and extremely few people get to even participate in it directly.

I think that there is a way to gain the necessary knowledge though. Here are the games I studied:

1) Street Fighter. I know about more than 20 versions of this game.
2) Virtua Fighter. It says version 5 on the box of the latest one, but really, there have been at least 15 versions of this game if you look closely.
3) Guilty Gear. I know of 8 versions of this game.
4) Magic: The Gathering. This game has changed (with new sets of cards) about 3 times per year for over 10 years.
5) World of Warcraft. I played that game for two years before it was released and I couldn’t even guess the number of mini-releases over that time. Maybe 50 or 100.

That is A LOT of data about how changes to a game’s balance out. You can study what the exact changes are from one version of a game to the next, then learn how those changes actually affected the game’s balance and how they players perceived the changes. I actually count those as two separate things, and on Street Fighter I had two separate main advisors: one who knew the most about how a change would affect the game system itself, and another who knew the most about how players would perceive changes.

You have to put in real, effortful study on following games like the ones I listed above, though. Just being along for the ride doesn’t necessarily get you much. Also, experience working at a game company is almost a detriment here, because game companies I’ve worked at don’t spend any time looking at things like “exactly why did Virtua Fighter change this move’s recovery to 8 frames?” Instead, the focus is on actually implementing and shipping games.

I also think that you can’t really fake this. There is no way I could have accumulated the knowledge that I have if my motivation was to be better at my career. My motivation is that I am actually interested in things like this. Thomas Hoving was actually interested in art history. You have to live an authentic life in your chosen area of interest to develop true, deep knowledge of it.

Garcia vs. Sirlin (Analysis vs. Intuition)

In Street Fighter, there is no possible way to create a "balancing algorithm" that will tell you if Chun Li's walking speed should be faster. How good is faster walking speed compared to damage on her fierce punch, compared to priority of her ducking medium kick, etc? You could do a year of math on that and still be more wrong about it than my guess in two seconds.

I was very aware of this when I designed Kongai, and I tried to make it as difficult as possible to compare the relative value of moves because of their varied effects. I wanted valuation (the ability of players to intuitiely know the relative value of moves in specific situations) and yomi (the ability of players to know the mind of the opponent) to be the two main skills in the game. And then I encountered garcia1000, a Kongai player who came from the poker community. While my goal as a designer is to facilitate valuation and yomi, garcia's goal as a player is to play optimally without needing to make any judgment calls on those things at all. He want's to compute the odds and solve the game. You could say that he's my worst nightmare, but he is work is also fascinating.

This is the Kongai virtual card game I designed for kongregate.com.

Garcia started by creating several "endgame problems" in Kongai. He chose very specific situations, Character A vs. Character B, all other characters are dead, life totals a certain amount, fighting range set to far, item cards given, etc. He would give very specific situations (which were much simpler because there are fewer chioces during the endgame), and then invite other testers to work out the solution for optimal play. Each specific situation took dozens of pages of forum posts to settle. They would also show amusing things such as optimal play giving a 3% edge over the more obvious plays, in one case. It was also amusing that the math reequired to solve these endgame situations was far more complicated than the math I used to set all the tuning variables. (Because I mostly used intution....)

What you have to keep in perspective though, is how limited these endgame solutions are. You could know the solution to dozens of them and you wouldn't have solved even 0.00001% of the game. The number of possible game states is very large indeed, and each of these endgame problems that took dozens of pages of posts to figure out solved only ONE gamestate. If I had used rigorous math to solve Kongai during development, I could have been working on it for 100 years.

Trusting Intuition

If you have this iceberg of knowledge, or if you’re like Feynman who can rely on the icebergs of others, you should still know two things about maximizing the value of intuitions. Intuition by experts is better at solving complex problems than analysis, but:

1) The intuitive expert will be less sure of his answers, while incompetent people will be very sure of their (wrong) answers.
2) Having to explain yourself diminishes your ability to draw on your intuition in the first place.

This is too large of a topic to go into depth on here, so I’ll give only a short summary. There’s a wonderful study on incompetence that shows that people who are incompetent at a task (logic, humor, grammar, etc.) grossly overestimate their own ability at the task and are unable to detect expert performance in other people who actually are skilled at the task. The reason is that the very knowledge they lack to do the task is the same knowledge they need to evaluate themselves and others.

I don't know what this diagram means, but it seems important somehow.

The result is that you will definitely have to deal with the loud complaints of incompetent people who are quite sure of themselves, and who might even have a well-developed tip-of-the-iceberg of reasoning, but no underside to their iceberg at all. I suggest somehow gaining enough authority that your vague feeling on a balance issue is able to trump their loud complaints. You might even try explaining why that is best for the game.

Several studies show that explaining yourself wrecks your intuition. If you see a person’s face, then must identify that person later in a lineup, you will do much better if you do NOT have to explain the face in detail beforehand. Your explanation is imperfect because the bandwidth of words is so narrow, yet your knowledge of the face is nuanced. The story you create about the face overwrites your actual knowledge and makes you perform worse in the lineup test. Other studies show that requring an explanation of thought process makes test subjects less able to come up with creative solutions for problems.

While balancing Street Fighter, I had the luxury of not having to really explain myself to anyone, and that was a great advantage. Note that I happily explained everything after the fact, but I’m talking about in the heat of development. If I wanted to try a balance idea, and I wasn’t exactly sure why I wanted to try it, I could. I did not need to convene a meeting and lay out a logical plan that people voted on. I could just do it, and test it.

There was a brief period of disaster where a new producer tried to track every single task I planned to do in the balancing process. I was reluctant to submit any list of future changes because every day, the landscape changed. Tomorrow I might learn how to implement something that I thought before was impossible to implement. The next day I might learn that a recent change removed the need for some other future change, based on playtest results. Every day, the thing I worked on was whatever thing I felt was most important that day.

The agility that method allowed was amazing, and it's the only way I can imagine doing things. I think it’s pointless to track my work in the way that producer wanted to because doing so gives the overall project no advantage, while it damages my ability to draw on my intuition. Game balance has not been on the critical path of development on any game I’ve ever worked on, meaning there’s always some other thing that pushes out the ship date. In balancing, you keep doing it and doing it until someone says you have to ship.

By the way, I went back to doing whatever I wanted with no oversight fairly soon on the Street Fighter project, so mostly ignoring the new producer's requests was a successful strategy.

My advice to not explain yourself and to have the authority to ignore incompetent complainers unfortunately sounds like a recipe for creating an ego-centric dictatorship that ruins a project. Yet, the best way to leverage intuition is to gain that kind of power on a project, and then not use it much. You want your subordinates to do their best jobs without having to explain every little thing to you either, after all.

Conclusion

Balancing a game when we know we can’t know how to play that game optimally is a deeply troubling problem. Logical analysis often fails at this type of complex problem because it doesn’t take into account all the nuances that our unconscious minds and intuitions can. To solve this balance problem, or any similar problem, we should build up a vast mental iceberg of knowledge and experience in the field. I don’t mean fake experience like working at a game company or getting your name listed in credits though, I mean real experience which only comes from effortful study. Use your own iceberg of knowledge in the form of intuition and seek out others with vast icebergs of knowledge and rely on their advice. Finally, somehow acquire enough power on a project that you don’t let your valid feelings about what to do get trumped by loud disagreement from incompetents and don’t let your intuition be destroyed by anyone who demands constant explanations of your every decision.

Friday
Oct172008

Balancing Multiplayer Games, Part 3: Fairness

In asymmetric games, we have to care about making all our different starting options fair against each other in addition to making sure the game in general has enough viable options during gameplay. That means each character in a fighting game and each race in a real-time strategy game should have a reasonable chance of winning a tournament in the hands of the right player. For collectable card games and team games like Guild Wars and World of Warcraft’s arenas, we should instead say that at least “several” possible decks and class combinations should be able to win tournaments.

Self-Balancing Forces

To make this semi-impossible task easier, we should use self-balancing forces if possible. This will let us go nuts with diverse options while building in some fail-safes to protect us from unknown tactics that players might develop in the future. I’ll give examples of this from two games: Magic: The Gathering and Guilty Gear XX.

In Magic, the various game mechanics such as counterspells, direct damage, healing, and so on, are divided amongst five colors. Players can build decks with as many of these five colors as they want, but the more colors they include, the harder it is to have the right mana to actually play the various colors of spells.

A simple diagram of the 5 colors of magic.

Consequently, decks are forced to specialize, which gives them inherent weaknesses. The color red, for example, has no way to destroy enchantment cards, so even if a red deck ended up being strong, it has a built-in weakness (it must either accept that it can’t destroy enchantments, or weaken its consistency by trying to incorporate another color that can). Also, each color has two enemy colors, and those enemy colors often include cards that are specifically powerful against their enemy colors. Again, if a red deck became too powerful, there will be blue and white cards that keep red in check, at least somewhat.

Finally, when Wizards of the Coast prints a new set with new mechanics, they usually include a card or two that are tuned to be fairly weak, but that specifically counter the new mechanic. I think they hope that these specific counters are not needed, but if the metagame becomes completely overwhelmed by the new mechanics, then there are at least some fail-safes the metagame can use to fight the new mechanic.

For example, Magic's Odessey block focused on new mechanics involving the discard pile (called the "graveyard" in Magic), and the card Morningtide could remove all cards from all graveyards. If players started getting too tricky with their graveyards, Morningtide was a counter. It practice, this counter wasn't really needed though. Later on, Magic's Mirrodin block focused on artifact cards. The card Annul could counter artifacts (and enchantments) for only one mana, and the card Damping Matrix prevented artifact abilities from working. In Mirrodin's case, the artifact mechanics really did get pretty out of hand. Annul and Damping Matrix were good ideas, but even stronger failsafes were needed during Mirrodin.

This is really a similar concept to Yomi Layer 3 that I mentioned in part 2. The idea is to build in counters to the game so that even if some things end up more powerful than you expected, the game is resilient enough that players can deal with it.

Guilty Gear is a very important example for its fail-safe systems. I described that game’s system in detail in this article, but here’s a quick refresher.

Guard Meter

Every time you hit the opponent, their “guard meter” goes down. The lower it is, the shorter their hitstun is. That means that even if a string of moves is an “infinite combo”, meaning that once you land the first hit, you could keep hitting them forever, their shorter hitstun eventually lets them block to escape the combo.

Progressive Gravity

When you are juggled in the air during a combo, the gravity applied to your character gets greater and greater over time. So even if a combo could juggle forever somehow, the victim’s body falls faster and faster over time, which would eventually ruin the infinite juggle.

Green Blocking

Imagine an attack sequence against a blocking opponent where do a few hits in a row that leave you pushed back, too far away to continue. But when you get to the last hit, you cancel it with a special move that makes your character move forward. After that, you repeat the sequence and force the opponent to block forever. In case this type of lock-down trap exists, Guilty Gear heads it off at the pass with a feature I call “green blocking.” While blocking, you can use some of your super meter to create a green force field that pushes the opponent pretty far away from you, letting you ruin the spacing of his trap.

Here's that green blocking thing in Guilty Gear.

Each of these features is designed to solve a problem that the designers didn’t even know they had. They just know that if the game ever ended up in a state of infinite combos or juggles or lockdowns, that some fail-safe features need to save them. Also, these fail-safe features freed them to design incredibly varied and extreme characters. No matter how crazy a character is, or how scary this rushdown tactics ended up, the designers knew that this defensive system of fail-safes shared by all characters would keep things at least somewhat in check.

Playtesting and Course-correcting

Whether or not your game has fail-safe systems, at some point you have to design a diverse set of characters / races / whatever, make each one coherent and interesting, then have the confidence that you’ll sort out the balance problems in playtesting. All the theory in the world will not save you from playtests, of course.

You need to start tuning the game, and react and learn as you go. Do not let a producer turn tuning into a fixed list of items that you are accountable for checking off, one by one. It’s an organic, continuous process that keeps going until you need the ship the game. Playtesting lets you discover things you couldn’t have predicted ahead of time, and you should be open to those discoveries. The goal isn’t to make the exact game you originally envisioned, because your original vision did not take into account all the things you learned from development and playtests. When you or the testers discover nuances or unexpected properties, you have the chance to build around those and incorporate them into the game’s balance.

The Tier List

During the balancing of Street Fighter, Kongai, and my card game called Yomi, I used a similar approach with playtesters. I think this approach doesn’t really depend on the genre, and the key idea is managing the tier list.

The term “tier list” is, I think, a term from the fighting game genre. It means a ranking of how powerful each character is from highest to lowest, but it also accepts that such a list cannot be exact. Instead of ranking 20 characters from 1 to 20, the idea is to group them together into “tiers” of power. Remember that if a divine being handed you a 100% perfectly balanced game, that players would still make tier lists. You should accept the existence of these lists from players as a given, and its your job to manage this list.

In Kongai and Yomi, I even gave the players a template for the tier list that is most useful for me as a designer. First, I tell them to think of three tiers: top, middle, and bottom. Then I tell them about the two “secret tiers” that I hope are empty.

0) God tier (no character should be in this tier, if they are, you are forced to play them to be competitive)
1) Top tier (don't be afraid to put your favorite characters here. Being top tier does not necessarily mean any nerfs are needed)
2) Middle tier (pretty good, not quite as good as top)
3) Bottom tier (I can still win with them, but it's hard)
4) Garbage tier (no one should be in this. Not reasonable to play this character at all.)

My first goal of balancing is to get the god tier empty. Of course some character will end up strongest, or tied for strongest, and that is ok. But a “god tier” character is so strong as to make the rest of the game obsolete. We have to fix that immediately because it ruins the whole playtest (and the game). Also, the power level of anything in the god tier is so high, that we can’t even hope to balance the rest of the game around it.

My next goal is get rid of the garbage tier characters. They are so bad that no one touches them, and it’s usually pretty easy to increase their power enough to get them somewhere between top, middle, and bottom. If they are somewhere in those three tiers (which gives you a lot of latitude actually), at least they are playable.

Akuma is god-tier in Super Street Fighter 2 Turbo.

Public Tier Lists

I really like it when playtesters all see each other’s tier lists. The debate this spawns is very useful for me to read (or overhear in person) and for the playtesters to sort out their ideas. Sometimes when someone put a character unusually high or low on the list, I dug deeper to find out that player really did know something most of the rest of us didn’t. Other times, that player is just crazy and the rest of the testers are happy to point that out. It’s also good to see what kind of consensus the testers come up with, like if they all rank a certain character as the worst, for example.

The biggest landmark moments in each of the games I balanced was when the tester communities consistently gave tier lists with no characters in the god tier or garbage tier. Once you’ve achieved that, the next goal is to compress the tiers. That means that you want the difference between the best and worst characters to be as small as possible. Notice that that means even if you have the same characters in the bottom tier that you did a month ago, you might have dramatically improved the game if all those “bad” characters are really only a hair worse than the tier above, rather than way worse.

Adjusting the Tiers

In all the games I balanced, I used the same approach of letting the top tier set the benchmark power-level. In Street Fighter, I already had an established top tier as a starting point from the previous game, but in Kongai and Yomi, it was somewhat accidental who ended up in the top tier. But early on, after the god tier was removed and it was pretty clear which characters / decks were top, I allowed that to be the target power level. In other words, the characters in that tier are “how the game is supposed to be.” Again, I didn’t plan exactly who would be here, but I accepted how it ended up and worked with it. So if the top tier is the target, it’s the bottom tier you should adjust the most. If the top tier is the intended power level, you don’t really want to mess up the good things you have going there. Instead, boost the bottom characters up and compress the tiers as much as you can, so you get the worst characters just barely below or equal to the best characters.

There are some psychological factors that I saw over and over again while making these adjustments. The first is that whenever I make a move or character worse (aka “nerfing”), players overreact. Sometimes that top tier creeps a little too high in power, or an otherwise average character ends up having something unexpected that’s crazily good, or a character has a move that really reduces the strategy in the game and needs to lose that in exchange for gaining something else. There’s lots of reasons for nerfs.

I’ll use some made-up numbers to convey the general idea here. Imagine a move is at power level 9 out of 10, and that’s just too good for that character. Time and time again, I saw that if I made the power level an 8 out of 10, playtesters would complain that the move was worthless and put the character down at least one tier. This happened consistently, and even in the cases where 8 out of 10 was still too powerful and it really needed to be a 7. For some reason, players in every game seem unable to grasp the concept that a top tier character who is made slightly worse can still be a top tier character.

This is one of the cases where I think you just can’t listen to the playtesters. Ignore their first reactions to nerfs, let them play it more and get used to it, let them see if they can still be successful with the new version of the move, then take their feedback on that move or character more seriously.

The other psychological effect to know about is what happens when you increase a move’s power. I learned about this Rob Pardo’s lecture on balancing multiplayer games at the Game Developer’s Conference, and I tried it on all the games I balanced, and I think Rob is right. He said that if you have a move that you’re not really sure how to balance, make it too powerful. If you make it too weak, then you run the risk of no one using it at all. Then, when you slightly increase its power, none of the testers will notice or care. They already decided that move is weak. Then if you make it slightly more powerful still, they still won’t care. Even when you inch it up past the reasonable level of power, it’s hard to get it on people’s radar and that makes it really hard to know how to tune the move.

Instead, Pardo said to start with the move too powerful. Then everyone will know about it and care about it. I did exactly this with T.Hawk, Fei Long, and Akuma in Street Fighter HD Remix, because I had trouble figuring out their power levels. Each one of those characters was the best character in the game at some point in development, and that meant I got lots of feedback from testers about these characters. It also gave me a sense of where the top of the scale even was. Sometimes my “too powerful” versions of a character would end up waaaaay too good, or sometimes just barely too good. By knowing where the upper limit was, it helped me pick appropriate power levels more quickly. That said, I did have to deal with the inevitable cries that follow all nerfs, but that just goes with territory here.

Illusions in Tiers

Another point from Rob Pardo’s speech on multiplayer games was not to balance the fun out of things. I’m very conscious of this as well. Don’t just think about the game as some abstract set of numbers that has to line up. You also have to think about how people will perceive it and whether it’s actually fun. Pardo said that he likes the player to feel like the tools they have are extremely powerful, even though they are actually fair.

Tafari is unfair!An example of this in one of my games is Tafari, the Trapper in Kongai. Tafari’s main ability is that the enemy cannot switch characters while fighting him. Switching characters is one of the game’s main mechanics, so fighting him is like playing rock, paper, scissors with no rock. It seems, at first glance, ludicrously powerful. But from the start, I gave Tafari several weaknesses and he loses many fights if he ends up having to fight on even footing. He’s best when you bring him in against an already-weak character to finish them off.

I knew Tafari was not too powerful. I tested him with many experts and they tended to rank him as middle tier once they got the hang of him. As we added new testers over time, probably nearly 100% of them claimed that Tafari was too strong. I refused to change him though and after a year of testing, the best players still ranked him as middle tier, while inexperienced players still ranked him as top. Tafari is an illusion.

I’m telling you this because you have to be very careful with feedback in cases where you intentionally made something feel more powerful than it actually is. It’s a success if you can pull that off though, because Tafari makes the game more interesting, creates lots of debates, and at the end of the day, he is balanced.

Counter Matches

In addition to the tier list, you should also be thinking about all the specific matchups. Street Fighter HD Remix, for example, has 17 characters and 153 possible matchups. For the version of Street Fighter before HD Remix, experts tend separate the characters into four tiers (none of them are god tier or garbage tier), and they place Guile in the respectable second tier. Even though that means Guile’s power level is acceptable, he is severely disadvantaged in two specific matches: Vega and Dhalsim. Is it ok that an overall good character gets countered by two specific characters? Not really.

If these were weapons in an FPS or units in an RTS or characters in team-based fighting game, then it might be acceptable. You pick up weapons in an FPS after the game starts, so their balance doesn’t need to meet the hard requirements of an asymmetric game. And units in an RTS and characters in team-based fighting game are examples of local imbalances, which are fine (it’s the races and teams that need to be balanced). But in Guile’s case, you lock in your choice of Guile at the start of the game, then you are stuck with him the entire game, so it really is a problem if he has some bad counter matches, even though players rate him fairly highly overall.

It’s really tricky to adjust anything in an asymmetric game though. How can we help Guile in just the Dhalsim match without affecting all the other matches? There’s no easy answer here, but I advise you to really solve the problem, rather than copping out.

My real solution to this problem was two-fold. First, for reasons unrelated to this particular match, I changed the trajectory of Guile’s roundhouse flash kick. This happened to help a bit against Dhalsim’s fireballs, so we’ll count that as a lucky accident. Second, one of Guile’s problems is that Dhalsim’s low punches can go under Guile’s Sonic Boom projectiles and hit Guile from across the screen, with no repercussions. I changed Dhalsim’s hitboxes so that Dhalsim now trades hits in this situation, rather than cleanly hits. This change has virtually no affect on any other match, so it’s a real solution to the problem.

A cheating solution would have been to special case this match and give Guile more hit points. This sounds attractive because you don’t have to worry about messing up other matches, but this non-solution feels really artificial. It messes with players’ expectations and intuitions about how many hit points Guile has.

A similar cop out would be to create a giant table in an RTS of every unit versus every unit and special case how much damage they all do to each other. Again, it messes with player intuition about how damaging each unit is, and creates and invisible, wonky system. I know you’re going to be tempted to use these types of special case solutions when balancing asymmetric games, but try your hardest to avoid them.

Conclusion

Start your design with some self-balancing forces and fail-safes if you can. Then go wild and create all your game’s diversity, then start the long road of playtesting. As you learn more from playtesting, change your course as you go. Start keeping track of tiers, first by fixing the god tier, then by fixing the garbage tier. Then compress the tiers so that even the bad characters are only slightly worse than the best characters. Finally, fix all the counter-matches you can by actually solving the puzzle, and avoiding cop out solutions.

Friday
Oct172008

Balancing Multiplayer Games, Part 2: Viable Options

In the previous article I divided the idea of balance into the two sub-concepts of viable options and fairness. I also defined the concepts of symmetric and asymmetric games, where the more varied the different starting options are that must be fair against each other, the more asymmetric the game is.

How do we make sure we have enough viable options during gameplay?

Yomi Layer 3

The worst thing you can have in a competitive multiplayer game is a dominant move (or weapon, character, unit, whatever). I don’t mean a move that is merely good, I mean a move that is strictly better than any other you could do, so its very existence reduces the strategy of the game. A dominant move also probably has no real counter, so even if the opponent knows you will do it, there’s not a lot he can do.

To protect against dominant moves, we should be aware of the concept of Yomi Layer 3. I wrote a whole article on just that, but I’ll quickly summarize it here. “Yomi” is the Japanese word for “reading,” as in reading the mind of the opponent (and it’s also the name of my strategy card game). If you have a powerful move and use it against an unskilled opponent, I call that Yomi Layer 0, meaning neither player is even bothering with trying to know what the opponent will do. At Layer 1, your opponent does the counter to your move because he expects it. At Layer 2, you do the counter to his counter. At Layer 3, he does the counter to that.

That might sound confusing, but it’s very straight-forward in actual gameplay of real games. All it means is you and your opponent each have two options:

You: A good move and a 2nd level counter
Opponent: A counter to your good move and a counter to your counter

The designer generally does NOT need to design Yomi Layer 4 because at that point, you can go back to doing your original good move. Here’s an example Yomi Layer 3 situation that I created in Street Fighter HD Remix.

Honda wants to do his torpedo move get close to Ken, but Ken throws fireballs to prevent this. I gave Honda the ability to destroy these fireballs with his torpedo, but only with the jab version of the move that doesn’t travel very far. If Honda can destroy a fireball with it and end up closer, that’s good for him. Ken can counter this by not throwing the fireball in the first place and letting Honda do the jab torpedo. As Honda is flying forward, Ken can walk forward and sweep, hitting the recovery of the jab torpedo.

Honda: torpedo that goes far or jab torpedo that destroys fireballs
Ken: fireball or walk up and sweep

Ken must quickly decide to either walk up and sweep (if Honda did a jab Torpedo) or block (if Honda did a fierce Torpedo).

I did not need to add anything to allow for Yomi Layer 4 though because Honda can counter Ken’s walk-up-and-sweep option by simply doing the original, full-screen torpedo. Yomi Layer 4 tends to wrap around like this in competitive games.

This concept is a reminder that moves need to have counters. If you know what the opponent will do, you should generally have some way of dealing with that. As you go through development of a game, always be asking yourself if various gameplay situations you find yourself in support Yomi Layer 3 thinking. If they don’t there might be a dominant move in there somewhere, which is bad.

Local vs. Global Balance

Does every possible situation in a game need to support Yomi Layer 3?
Answer: no.

Does every possible situation in a game even need to be fair to both players?
Answer: definitely not.

Remember that I defined fairness by the overall chance of winning, given different starting options. Think of that as a global term, in that it applies to the game as a whole from the start of gameplay until someone wins. But the local level, meaning a particular situation in the middle of gameplay, does NOT need to be fair. Even symmetric games like Chess are supposed to have unfair situations. When you have 3 pieces left and the other guy has 9 pieces left, it’s supposed to be unfair to you. Or in StarCraft, if we find that two Zealots beat (or lose to) 8 Zerglings--even though they cost the same resources to make--that is perfectly fine. We don’t care if local situations like that are unfair or not, we only care if Protoss is fair against Zerg.

Checkmate Situations

I call a situation a checkmate situation if it means that one player has almost certainly won, even though the game isn’t actually over. For example in Super Street Fighter 2 Turbo, if Honda lands his deadly Ochio Throw against Guile in the corner, he can then follow up with a series of moves (involving more Ochio Throws) that virtually guarantee victory. Human error could change the outcome, but as soon as you see that first move, you know it should be a checkmate.

Are checkmate situations ok? They clearly violate our requirement that there be many viable moves (Honda really only has one option here and Guile has no good options). They clearly violate the concept of Yomi Layer 3. And yet, the answer is that checkmate situations can be ok. It’s sooooo hard for Honda to get close to Guile in this match, that if he does, he basically deserves to do 100% damage. All the gameplay that takes place before the checkmate is pretty good, and even though Honda can do this abusive thing up close, the match is still heavily in Guile’s favor overall.

I’d like to point out the other side of this argument though. Some players think that even though Guile has the advantage in this match, Honda’s ability to repeat that Ochio Throw is too degenerate. They say yes he needs it to win, but the game would be better overall if things weren’t so extreme. If only Honda could get close to Guile a little more easily, then he would not need a checkmate situation.

I think Rob Pardo, VP of Game Design at Blizzard, echoed this sentiment in a lecture he gave at the Game Developer’s Conference on multiplayer balance. He said that “super weapons” in real-time strategy games are generally a bad idea. They leave the victim feeling that there is nothing they could have done (checkmate!). He explained that even though the Terran nuclear missile in StarCraft looks like a super weapon, it has many built-in weaknesses: a ghost unit must be nearby the victim’s base, there is a red targeting dot on the victim’s base, and a 10 second countdown is announced to the victim, giving him time to destroy the ghost to prevent the nuclear missile.

Pardo has a good point and so did the players who complained about Honda. Even though I think checkmate situations can be ok, it’s telling that when it was my turn to make the decisions, I removed Honda’s checkmate situation in Street Fighter HD Remix. In that game, I gave him an easier time getting close to Guile, but replaced his checkmate situation with a Yomi Layer 3 situation so there’d be more viable decisions throughout the match.

Lame-duck Situations

Lame-duck situations are just like checkmate situations, but with one difference: time. Honda’s checkmate situation takes something like three seconds to get through. But consider a similar situation in the fighting game Marvel vs. Capcom 2. In that game, each player has a team of three characters: one on the playfield and two on the bench. Players can call in one of their benched characters for an assist move at any moment, letting them attack in parallel with their main character and assist character at the same time. Or better yet, they can stagger the attacks so that each attack covers the recovery period of the other.

When one player is down to his last character, he can no longer call assists. Fighting with just one character against an opponent with two or three characters might as well be checkmate, almost all the time. The problem is that it takes excruciatingly long for the match to actually end. It takes so long, that I call that last portion of the game the lame-duck portion. Other fighting games are exciting right up to the last moment, but a lame-duck portion of gameplay means the real climax is somewhere in the middle, and then players are forced to act out a mostly pointless endgame while spectators lose interest. Yes, on rare occasions someone pulls off an amazing comeback, comebacks also happen in games without lame-duck endings, so that’s not a good argument.

While a checkmate situation is maybe ok, you should try to avoid game designs that allow for long lame-duck endings. Both Chess and StarCraft have this undesirable property, and it just means that players often concede the game before the actual end. Those games also show that it’s not the worst thing in the world to have lame-duck endings (because Chess and StarCraft are good games), but you should still avoid them as a designer if at all possible.

Even after this, the game doesn't technically end if the victim has some obscure building hidden in the corner of the map. The ending conditions in StarCraft don't really match the true winning conditions.

Explore the Design Space

Design space is the set of all possible design decisions you could possibly make in your game. Whether your game is symmetric or asymmetric, it’s usually a good idea for your game to touch as many corners of the design space as possible. This helps give a game depth and nuance, but also tends to protect you from dominant moves.

For example, in the virtual card game I designed called Kongai, each character has four moves. When a move hits, it has a percentage chance to trigger an effect. For a given character, we could vary the damage, speed, and energy cost to come up with four different moves. If that’s all we did, though, we’d be missing out on a chance for more diversity in the game, and we’d get dangerously close to making some of those moves strictly better than others which would reduce the number of viable options. Instead, I tried to explore the design space as much as possible with different effects. One move can change the range of the fight from close to far, which is usually only possible before the attack phase. Another move deals enough damage to kill every character in the game, but only four turns after you hit with it. Another move can hit characters who switch out of combat, even though switching out usually beats all attacks.

The point is that by exploring the design space as much as possible, it’s a lot harder for players to judge the relative value of moves. How good is a 90% chance to change ranges during combat as opposed to a 95% chance to hit a switching opponent with a weak move? It’s hard to say and depends on a lot of factors, and that’s good because it means each move is likely to be useful in some situation and knowing when is an interesting skill to test. Incidentally, I call that skill valuation.

Players want you to explore the design space, too. When everything is too similar in a game, it feels like one-note design rather than a symphony. The more nuances and different choices you present, the more each player can express his own playstyle.

Wheat from the Chaff

Here’s my favorite quote from Strunk & White’s The Elements of Style:

Omit Needless Words

Vigorous writing is concise. A sentence should contain no unnecessary words, a paragraph no unnecessary sentences, for the same reason that a drawing should have no unnecessary lines and a machine no unnecessary parts. This requires not that the writer make all his sentences short, or that he avoid all detail and treat his subjects only in outline, but that every word tell.

FIVE mana for this? Omit needless cards.Treat your game design the same way. Yes you should explore the design space, but omit needless words, mechanics, characters, and choices. Although your primary goal regarding viable options is to make sure you’re giving the player enough options, your secondary goal should be to eliminate all the useless ones.

Marvel vs. Capcom 2 has 54 characters, which is ridiculously many. How many are viable in a tournament? I’ll say 10, and I’m being generous. I actually call that a success because coming up with 10 characters in fighting game that are fair against each other is really hard. That said, it does look pretty bad to have more than FOUR TIMES that many characters sitting around in the garbage pile of non-viable choices. Compare this to Super Street Fighter 2 Turbo’s 16 characters, almost all of which are tournament viable, or Guilty Gear’s 23 characters, almost all of which are viable, and you see what a compact design looks like.

One genre of game is notable for intentionally creating an enormous number of useless options: collectable card games. Even though I claim Magic: The Gathering is one of the best designed games in the world, I’m judging the balance on an absolute scale of how many good cards/decks the tournament environment supports, not the ratio of viable to worthless. On that scale, we’d have to rate the game as a complete failure.

MTG’s Mark Rosewater defends the intentional inclusion of bad cards for design reasons, but this is only because the marketing department has brainwashed him into going along with their admittedly very successful rip-off scheme. Rosewater claims that bad cards are ok because they:

a) allow for interesting experimental mechanics that might end up being bad
b) test valuation skills because if all cards were equally good, there’d be less strategy
c) give new players the joy of discovering that certain cards are bad, as a stepping stone to learning the game
d) are necessary because even if they came out with a set that consisted entirely of known good cards from old sets, there’d still be only 8 tournament viable decks and the rest of the cards would not be used.

The solution to this problem is clear if we only cared about design and not rip-off marketing: print fewer cards. Reason a) is a great one, experimental cards that end up accidentally bad are fine. Reasons b) and c) are just silly. Saying the game would not have enough strategy if bad cards were removed is an insult to Mark’s own (terrific) game. Saying that new players need to discover the intentionally bad cards is even more silly because this comes at the cost of making sets overwhelming to new players and needlessly unwieldy for expert players. We all know the real reasoning here is to make players buy more random packs of cards to get at the few good ones.

Finally, reason d) is a blatant admission that the game should have fewer cards. Ironically, I’m not even sure d) is true. Maybe printing a large set of all good cards really would lead to more viable tournament decks than the game currently supports. If not though, they should stop printing all that chaff.

You could say that MTG proves that it’s really all about chaff, though. Maybe giving a few viable options amidst a sea of bad ones is good business when you sell by the pack. But we don’t see this in other genres and really we just haven’t seen anyone crazy enough to stand up to MTG on this issue and offer a competing card game that’s just as well designed but that eliminates all chaff. (A future Sirlin project?)

Double-blind Guessing

I used the technique of double-blind guessing in both my Yomi card game and my Kongai virtual card game (that one’s actually a turn-based strategy game dressed up like a card game). Anyway, the idea is to make all players commit to a choice before they know what the others have committed to. This is the same setup as the prisoner’s dilemma.

I learned this concept from fighting games. Though they appear to be games of complete information because you can see everything the opponent can see, fighting games are actually double-blind games. They come down to very precise timing and the moment you jump, you often don’t know that the other guy threw a fireball. You only know that 0.3 or 0.5 seconds ago he didn’t. It takes a small amount of time for the opponent’s move to register in your brian, and though it might seem insignificant, it’s actually critical to fighting games even working as strategy games at all.

Real-time strategy games like StarCraft have the same property, but on a much slower time-scale. You often do not know exactly what the opponent is building in his base at the moment you must decide what you should build. Even if you were able to scout his base, you might be working on information that’s several seconds old, so you have to guess what he did during that time.

If we were to remove the double-blind nature from my two card games Yomi and Kongai, and from fighting games and real-time strategy games, I think all of them would be broken. All those games need double-blind decision-making to be interesting. This design pattern is a way to increase the chances that you have many viable moves in your game because it naturally forces players into the Yomi Layer 3 concept I talked about earlier. Weaker moves become inherently better in a double-blind game because it’s easier to get away with doing them without being countered. I’ve even joked that some matches between the world’s best Virtua Fighter players are “a battle of the third-best moves.” Sometimes the players are so paranoid about doing their “best” option for fear of being countered, they fall back on a third best option that no one would ever counter (though it’s quite a sight when the opponent counters even that!). If no guessing was involved at all, players would not use third-best moves.

Playtesting

Finally, playtesting, especially with experts, is how you figure out where your problems really are. Do the experts ignore some vast portion of you game’s moves? Have they discovered a bunch of checkmate situations that you didn’t know about? Do you see them using a variety of strategies?

How to use playtests is really a whole topic of its own, but here’s a few points to keep in mind. First, be skeptical of them. Gamers tend to overreact to changes and claim that no counters exist to some strategies when counters do, in fact, exist. It can take years to sort out what is really effective in a game, and playtesters during your beta are only on the first few steps of that long journey. If they find what looks like the best strategy in the game, it might just be that they have found a local-maximum. Maybe some radically different way of playing that they have not yet discovered ends up being more powerful. This is actually par for the course in fighting games.

Here I am testing Street Fighter HD Remix with some expert players.

That said, playtests are really all you have. Theory is not a substitute for experts playing against each other and trying their hardest to win. I think everyone knows they need playtests, but the hardest question is who do you listen to when all your playtesters disagree, and how do you know when playtesters are wrong about how powerful something is? That question is so hard that I’ll save it for part 4 of this series when I tell you how much trouble we’re really in trying to balance a game at all.

Conclusion

To ensure we have many viable options, building in counters with the Yomi Layer 3 system is a good start. Not all situations need this though, and checkmate situations might be acceptable, but you should avoid their their longer cousins, lame-duck situations, if possible. Explore your game’s design space by offering moves as different as possible because this technique has a good chance of making all moves useful somewhere and it makes it very difficult to determine what the best moves really are. That becomes an interesting skill test for players. Eliminate all the worthless options because they confuse the player and add nothing, but they make you a lot of money in a certain genre. The double-blind guessing mechanic helps keep more moves viable than otherwise would be.

And finally, all the theory in the world does not substitute for playtesting.

Friday
Oct172008

Balancing Multiplayer Games, Part 1: Definitions

Balancing a competitive multiplayer game is not for the faint of heart. In this article I’ll define the terms that will let us know what we’re talking about in the first place, then in the second and third articles, I’ll pretend that we have some hope of solving the wicked problem of game balance and I’ll explain techniques to do it. Then in the fourth article, I’ll try to impress upon you what deep trouble we’re really in.

First, the terms. Let’s start with balance and depth as defined by the Philosopher King of game balance:

A multiplayer game is balanced if a reasonably large number of options available to the player are viable--especially, but not limited to, during high-level play by expert players.

--Sirlin, December 2001

A multiplayer game is deep if it is still strategically interesting to play after expert players have studied and practiced it for years, decades, or centuries.

--Sirlin, January 2002

This definition of balance is pretty good, but there are two concepts hiding inside that term viable options. On one hand, I meant that the game doesn’t degenerate down to just one tactic, and on the other hand, I meant that if there are lots of characters to choose from in a fighting game or races to choose from in a real-time strategy game, many of those characters/races are reasonable to pick. Let’s call the first idea viable options and second idea fairness in starting options, or just fairness for short.

Viable Options: Lots of meaningful choices presented to the player. For depth’s sake, they are presented within a context that allows the player to use strategy to make those choices.

Fairness: Players of equal skill have an equal chance at winning even though they might start the game with different sets of options / moves / characters / resources / etc.

Viable Options

The requirement that we present many viable options to the player during gameplay is what Sid Meier meant when he said that a game is a series of interesting decisions (a multiplayer competitive game, at least).

Not enough viable options.If an expert player can consistently beat other experts by just doing one move or one tactic, we have to call that game imbalanced because there aren’t enough viable options. Such a game might have thousands of options, but we only care about the meaningful ones. If those thousands of options all accomplish the same thing, or nothing, or all lose to the dominant move mentioned above, then they are not meaningful options. They just get in the way and add the worst kind of complexity to the game: complexity that makes the game harder to learn yet no more interesting to play.

For the sake of depth, we also hope that the player has some basis to choose amongst these meaningful options. If the game at hand is a single round of rock, paper, scissors against a single opponent, there is nearly no basis to choose one option over the other so it’s hard to apply any kind of strategy. And yet a game of Street Fighter might be decided by a single moment when you choose to either block, throw, or Dragon Punch, or a game of Magic: the Gathering might be decided by a single decision to play a Counterspell or not. These examples at first glance look like the rock, paper, scissors example, but the decisions take place inside the context of a match that has many nuances where each player is dripping with cues about his future behavior. In Street Fighter and Magic, the player does have basis to choose one move over the other, and more than one choice is viable, we hope.

Also for depth, we prefer if the meaningful choices depend on the opponent’s actions. Imagine a modified game of StarCraft where no players are allowed to attack each other. All they can do is build their base for 5 minutes, then we calculate a score based on what they built. There are many decisions to make in this game, and it might have several paths to victory, but because these decisions are purely about optimization--more like solving a puzzle than playing a game--they make for a shallow competitive game. Fortunately, in the actual game of StarCraft, you do need to consider what your opponent is building when you decide what to build.

While we require many viable options to call a game balanced, the requirement about giving the player a context to make those decisions strategically and the requirement that the decisions have something to do with the opponent’s actions are really about depth. They’re worth pointing out though because we should attempt to increase the depth of the game as we balance it, not decrease it.

Fairness

Fairness, in the context I’m using it here, refers to each player having an equal chance of winning even though they might start the game with different options. In Street Fighter, each character has different moves, in StarCraft each race has different units, and in World of Warcraft, each arena team has different classes, talent builds, and gear. Somehow, all of these very different sets of options must be fair against each other.

I want to stress that I am only talking about options that you’re locked into as the game starts. That’s a very important distinction. Options that open up after a game starts do not necessarily have to be fair against each other at all. Imagine a first-person shooter with 8 weapons that spawn in various locations around the map. Two of these weapons are the best overall, 3 are ok but not as good as the best weapons, and the remaining 3 are generally terrible but happen to be extremely powerful against one or the other of the 2 best weapons.

Is this theoretical game balanced? It certainly might be, meaning that nothing said so far would disqualify it. A designer could decide that he wants all weapons to be of equal power, but he need not decide that as long as each weapon is still a viable choice in the right situation. It might be fine to have two powerful weapons that players compete over, a few medium power weapons that are still ok, and some weak weapons that allow players to specifically counter the strong weapons. There could be a lot of strategy in deciding which parts of the map to try to control (in order to access specific weapons) and when to switch weapons depending on what your opponents are doing.

These eggs don't *have* to be balanced against each other.

By contrast, a fighting game with 8 characters designed by that scheme is not balanced because it fails the fairness test. Players choose fighting game characters before the game starts, but they pick up weapons in the first-person shooter example during gameplay. Being locked into a character that has a huge disadvantage against the opponent’s character is unfair.

Games that let players start with different sets of options are inherently harder to balance because they must make those sets of options fair against each other in addition to offering the players many viable options during gameplay.

Symmetric vs. Asymmetric Games

Let us call symmetric games the types of games where all players start with the same sets of options. We’ll call asymmetric games the types of games where players start the game with different sets of options. Think of these terms as a spectrum, rather than merely two buckets.

Symmetric                       Asymmetric
<------------------------------------->
Same starting options       Diverse Starting options

On the left side of the spectrum, we have games like Chess. In Chess, each side starts with exactly the same 16 pieces. The only difference between the two sides is that white moves first. Because of this different starting condition, we shouldn’t say that Chess is 100% symmetric, but it’s damn close. If Chess were the only game you had ever seen, you might think that the black and white sides are played radically differently; white sets the tempo while black reacts. There are entire books written about how to play just the black side. And yet if we zoom out to look at the many games in the world, we see that the two sides of Chess are so similar as to be virtually indistinguishable when compared to two races in Starcraft, two characters in Street Fighter, or two decks in Magic: The Gathering.

Monopoly is symmetric because the starting options are the same for all players, even though the pieces look different.The more diversity in starting conditions the game allows, the farther to the right of our spectrum it belongs. So asymmetry, as we mean it here, is a measure of a game’s diversity in starting conditions. This is not meant to be an exact science, so there is no specific formula to determine where a game belongs on this spectrum, but it’s a handy concept anyway.

Let’s look at a few examples. StarCraft has three very diverse races so it belongs toward the right side of our spectrum. That said, even if the three races were as different as imaginable from each other, the number three is small enough that we shouldn’t put it at the far right (admittedly, this is a judgment call). Fighting games can have dozens of characters that play completely differently and they tend to have more asymmetry than most other types of competitive multiplayer games.

That said, individual fighting games can vary quite a bit in just how asymmetric they are. Virtua Fighter, for example, is an excellent and deep fighting game, but the diversity of characters is relatively low compared to other fighting games. All characters have a similar template compared to Street Fighter where some characters have projectiles, or arms that reach across the entire screen, or the ability to fly around the playfield. Meanwhile, Guilty Gear, a fighting game you’ve probably never heard of, has more diversity than any other game in the genre that I know of. One character can create complex formations of pool balls that he bounces against each other, another controls two characters at once, another has a limited number of coins (projectiles) that power up one of his other moves and a strange floating mist that can make that powered up move unblockable. It’s almost as if each character came from a different game entirely, yet somehow they can compete fairly against each other. Guilty Gear is possibly all the way to the right of our chart because it has both wildly different starting options (characters) and many of them (over 20!).

Magic: The Gathering is also extremely asymmetric in the format called constructed where players bring pre-made decks to a tournament. The variety of possible decks is staggering and tournaments usually have several different decks of roughly equal power level, even though they play radically differently.

First-person shooters tend to be very far toward the symmetric side of the spectrum, usually offering the same options to everyone at the start, except for spawning location. Remember that picking up different weapons during gameplay, or even changing classes during gameplay in Team Fortress 2, does not count as asymmetric for our purposes. (Again, because those different options don’t need to be exactly fair against each other.) Also, first-person shooters that do have asymmetric goals for each side often make the sides switch and play another round with roles reversed so that the overall match is symmetric.

Now that we’ve mapped out where some games fit on our spectrum, remember that this is not a measure of game quality. If your favorite games appear on the left (symmetric) side, that does not mean they are bad. If you like StarCraft more than Guilty Gear, you do not need to be upset that Guilty Gear is “more asymmetric.” The spectrum is simply meant to give us an idea about how different the starting options of a game are, not about the depth or fun of the game.

No matter where a game appears on this spectrum, it still needs offer many viable options during gameplay to be balanced. In addition to this, the farther a game is to the right of the spectrum, the more it needs to care about balancing the fairness of the different starting options. In the next part of this series, I’ll talk about how we can design games that make sure to offer enough viable options and in the article after that, I’ll explain how we can attempt to create fairness in those pesky asymmetric games.

Page 1 ... 4 5 6 7 8 ... 9 Next 5 Entries »