I'll give you an anecdote from Codex development, my customizable not-collectable card game. First though, a more general concept. When most playtesters are complaining that something is too weak or too strong...should you change it? You'd sure hope that they are right and that yes you should change it. That is kind of the point of having people playtest a thing in the first place, to find issues with it that you can improve. There is a danger to it though, so there's a judgment call you should be aware of.
I've heard Blizzard speak about this exact issue before, and I like the philosophy they mentioned. On the one hand, yes you want to improve the game over time. On the other hand, you actually don't achieve that by making every change everyone asks for. If you do that, you'll move some things in the wrong direction sometimes, and you'll weaken things that weren't too strong or strengthen things that weren't too weak. Another thing Blizzard has mentioned is that if you change stuff every time any balance claim is made, you end up training your players to not look very hard for counters. You train them to rely on you, the developer, as a crutch and they might not be reaching the higher level of play they should reach before making the claim in the first place. So Blizzard's point is some temperance is required: you do want to make changes, but only when they are warranted.
Often when I hear playtesters wanting a change, I take the opposite side and give the reasons why a change shouldn't be made. That kind of pushback creates a least *some* barrier to too many changes happen. If they were right in the first place, they shouldn't have too much trouble explaining why the points I made weren't good enough, or weren't as important as their points, or whatever and that's fine. If they can make a good case that took some counter-points into account, probably the change would be good. Incidentally, with some people this is a totally straightforward and emotionless discussion, while with others it gets into drama. I have found I could make like ten times the progress by having 10 side discussions with the level-headed testers in the time I could have 1 discussion with the open group that includes...all types of people. So there's another thing to keep in mind. It's good to include more people for more viewpoints and to discover more problems, but it's also good to be efficient with fewer.
Back on point, I'd like to give some examples of playtest situations that were kind of unusual. Like I said, usually if a lot of people think there's a problem with something, there is. But knowing a few of the unusual counter-examples might help you identify if you are experiencing just such a counter-example when balancing whatever game you might be working on. So here's those unusual cases:
Tafari In Kongai
Tafari is a character in the Kongai virtual card game I designed for kongregate.com. He was intentionally a controversial, game-warping character. His ability is unique in all the game in that he prevents other characters from switching out against him. Characters switching in and out is a core mechanic of the game, so it's a huge deal that he disables this. It screams "broken" the first moment you hear about it. Tafari's other moves were designed with this in mind though, so he doesn't have any kind of reliable, explosive damage potential. He is kind of..."ok." Against some characters he has advantage, agaist others he's not even that great. But wow does he feel unfair at first.
The first wave of comments was that he was absurdly unfair. I kind of had to ignore that though because I expected that based on his "feel." When new players started playing the game, they usually claimed he was unfair too. What about experienced players who had a chance to play as him and against him for a while? Even then, they ranked him top tier for a while, but eventually he slipped to 2nd tier at best. He only ever had slight adjustments that had more to do with fixing bugs on how many times poison darts proc'd. His ability is just so crazy *feeling*, that people made wrong balance claims for quite a while. In the end, he was ok as-is.
Stolen Purples In Puzzle Strike
This is almost the same story. I even had Tafari in mind when I created Stolen Purples. This chip is game warping in that you play Puzzle Strike differently if it's in the bank than if it's not. At cost 4, many said it was just way too good. Was it? Usually when a lot playtesters said a chip was the wrong cost, they were right. But Stolen Purples had that same feature going as Tafari: the very idea you can steal purple chips from people *feels* so powerful that it can be hard to be objective about it. I didn't want to change it. After a while, one playtester said something pretty interesting. It was something like "I think we all subconsciously think that red chips (Stolen Purples is red) are supposed to suck, so we're thrown off by this one being good enough to buy. Probably some red chips need so much teeth that they legitimately compete with purple chips for you buy, and they give even more reason to care about having blues to protect yourself." Indeed.
While Stolen Purples is game-warping, it didn't really end up being too powerful, despite a ton of claims in the old days. It's merely "really really good."
Setsuki in Yomi
During Yomi's development, many people said Setsuki was too weak. Was she? There was a big problem in getting to the bottom of that. With other characters, when a big group of players said a character was weak, there was not much reason to question it. Just figure out where to add more power. With Setsuki though, the problem was that everyone was terrible at playing her. She plays in a strange way that's different from other characters. She often wants to make plays that would be bad with anyone else, but for her they will refill her hand. She wants to "waste" cards at just the right times to trigger her hand refill. She also has some nuances to her Bag of Tricks ability that you have to be aware of.
So of the set of people who said she was bad, *most* of that set were playing her badly and that tells us little, if anything. Then one very good playtester made the same claim. I explained to him the concept that everyone says she's bad because they don't get it, so I asked him if he was at that level of understanding, or if he knew all that, was totally good at her, and was making a "level 2" claim. He said he would get back to me.
Later he came back and said he had now reached level 2. He sees why other people were wrong in the reasons for their claims she is weak, but he--knowing how to actually play her--still claims it. THAT is good feedback. I asked him if he could get another good player who played her well to agree with him, and he was able to do that. So in this case, the testers ultimately were right, but the masses were not right on how much improvement was needed. Those on level 2 said only a bit of improvement was needed (most people said huge buffs were needed), so we made those slight changes and it was enough. Great.
Prohibition in Codex
The card Prohibition also reminds me of Tafari. It's game warping, though not as much as Tafari is. It allows the player to name a number, then opponents can't play units, spells, or upgrades that cost that much. "Is Prohibition too weak or too strong" has come up at basically every playtest of Codex ever.
Initially, I thought it was too weak if anything. The opponent can play around it by playing stuff of other costs. Because of the nature of how Codex works, it's easier to play around than it would be in Magic: the Gathering. In Codex you have more fine control over which cards you draw, and you can get rid of cards you don't want (like the ones that cost whatever they named) by playing them as workers (resources). Yes, the player of Prohibition is getting some advantage by making the other player play around it, but that's kind of the point. It doesn't seem like a huge amount of advantage considering they can do so many other things.
But if all that is right, why did this conversation come up over and over and over again? I remember one game where I said "Looks like you're in trouble. I guess you could cast Doom Grasp and be ok though. Oh...you can't because that's the cost they named with Prohibition. Well yeah tough luck." More and more stories like that came up over time. What's worse is that Prohibition is in a certain category of cards that you are able to get with 100% certainty on the first two turns. A card that can potentially shut down certain things is ok, but when you can so easily cast it so early every game it's kind of oppressive.
Yet another issue with it is that there aren't a lot of ways to remove it, and that's kind of on purpose. "Upgrade" cards are generally pretty reliable. Other types of cards are even harder to defend than you're used to in other similar card games, so it's kind of nice to have one type that isn't quite so easy to remove. Why don't ALL upgrade cards have the same problem as Prohibition then? Part of the answer is that most other upgrade cards...upgrade your own stuff. It's less important that you get rid of some buff to the other guy than get rid of a thing that's blocking your own plans. The other part of the answer is that if you do have one of the few things that can get rid of Prohibition, the other guy can name the cost of your answer to prevent you from even playing it.
So after like a thousand times of "Should something be done about Prohibition?" I have say the answer is yes. People are still somewhat split on it, but it's come up way way more times than I'd expect if it were a case like Tafari or Stolen Purples or Setsuki where simply getting better at the game was a solution. In other words, Prohibition kind of looks like it's one of those unusual exceptions, except maybe it isn't. Maybe it's just too damn powerful. Or maybe it's a bad idea to allow a game-warping effect to be so prevalent and easy to use. In any case, I revised it to be a unit so that it's much easier to kill and also to only prevent the opponent from casting units of the named cost, rather than units/spells/upgrades. I think it will now play a role more in line with any other card, and we can finally get on to other discussions. There will of course be substantially more testing.