Update on scoring, accuracy, and Review Mode/Activity Tracker situation


#1

Hi everyone,

As originally discussed here, we've wanted to let players know that scores may alter slightly, namely by going up. Obviously everyone likes to get more points, but we also have wanted to be clear with you about why such things happen, particularly if we're not changing the points system itself (which we aren't changing).

Following that last post, where we had to walk back some of our original thinking and re-analyze, here is an update! Bear with me, because the whole scoring change is actually part of fixing the discrepancies between Review Mode and the Activity Tracker. Basically, the TLDR version: Review Mode (following one bug fix) has actually wound up being the more accurate of the two, so we're going to fix the Activity Tracker so that it syncs up with what Review Mode shows you, and a result is that sometimes you'll get more points and won't be negatively affected by sloppy TBs. If you would like to understand exactly what goes into all this, read on!

1. First, the remaining issue with Review Mode and the Activity Tracker is:

Well, in a nutshell, although your accuracy stats balance out in terms of cubes' final consensus, the code for Review Mode and the code for the Activity Tracker currently look at different consensus calculations. The Activity Tracker does not correctly account for how your trace actually gets put into consensus. So, naturally we've wanted to fix that.

2. To have the Activity Tracker match the % you see in Review Mode, we have to alter the scoring code.

Again, this doesn't mean changing the actual metrics that go into the points you receive. But we have to change the scoring code to actually score everyone by what's wound up in the consensus; this has been overdue. And the Activity Tracker relies on the scoring info. Right now, if you are Player 2, you are scored against the TBer as if the TBer singlehandedly produced consensus, which has meant that even when your trace as Player 2 is accepted into the real consensus, you can be scored as if it wasn't.

When we alter the scoring code, your score as Player 2 will tend to be higher insofar as you won't be punished for undercoloring, just for overcoloring. So if the TBer adds something that Player 2 doesn't, there's no scoring penalty to Player 2, but if the TBer doesn't add something that Player 2 adds, Player 2 will still receive the same score as they would have in the old scoring code (a slightly penalized score).

However, while everyone might appreciate a higher score, we know that it's much more common for a TBer to miss stuff, and for Player 2 to correctly add it. Also, we would rather penalize undercoloring in general. So...

3. We'd like to additionally change the % agreement threshold at which segments are added to consensus.

We want to make it so that if 2 enfranchised players play a cube, only 1 of them has to add a segment for it to be included in consensus. (For 3 players, 2 of them would have to add it; for 4, still 2; in the rare case of 5, we'd need 3 to add.)

This would flip things around such that Player 2 would be penalized for missing something the TBer found, but Player 2 would not be penalized for finding something the TBer missed. So now Player 2 would still be earning more points and only being penalized when it made the most sense to penalize. And Player 3 would wind up being less penalized in the long run by a bad trace from the TBer or Player 2, too.

Possible tradeoffs include easier merger growth and someone being able to deliberately overcolor for more points, but since it's always easier to spot a merger than to spot a missing branch, and since you would have to know exactly what player # on a cube you were in order to make overcoloring work for you, HQ isn't super concerned about these issues. One other thing to consider is that with the threshold alteration, disenfranchised players would wind up affecting consensus less than they currently do, but HQ would like to stress that those players already barely affect it.

4. So, we're pretty convinced of the benefits on this end, but do you have any questions, any concerns we haven't addressed here, etc.?

Let us know in the comments and we'll try to clarify the above! If a visual example would help, I have something in mind based on what Chris showed the rest of us today. We don't expect to be ready to make these two changes (scoring and consensus threshold) for at least a few more days, because we still have to make certain database preparations.


#2

Takes some reading, lol but looks good to me :slight_smile: and might finally cause less groaning and verbal swearing in being the "snapper" or 2nd player lol :stuck_out_tongue: ty!


#3

sounds good in principle but after this change the scythes will have to be more alert since every merger first or second player make also will spawn new cubes in relics. and we will also get a lot more dupes to fix.


#4

Yes, Chris says that's another tradeoff we may have to watch out for. If necessary we could change Level 2 spawning to accommodate it, like if people overall appreciated the new system and we didn't think Relics would grow so slowly that it became painful or something.


#5

I have long since thought "what if lv1's dropped from wt 3 spawning to wt 2 like lv2s?" now with last scythe wins while we will still need to be as vigilant as always it is much much easier to remove mergers no matter how fast they grow lol and it'll make lv1 cells grow faster which afik has always been a...wish of HQ lol.


#6

I need a bit of clarification. Are you saying that everything the first two tracers trace in gets added to the cube? As in player A traces their trace, and it goes in (kinda how it is now tbh, just nothing spawns from it) Then player B traces a completely different trace, do both traces go in? Do they both spawn stuff? then if they both spawn stuff, when player C puts their tiebreaker trace in, does stuff automatically just disappear? And then how does this affect player 3? if they both over and undercolor if a and b have put in drastically different traces?


#7

Here are some answers I've confirmed with Chris:

Are you saying that everything the first two tracers trace in gets added to the cube?

Yes.

As in player A traces their trace, and it goes in (kinda how it is now tbh, just nothing spawns from it) Then player B traces a completely different trace, do both traces go in?

Yes.

Do they both spawn stuff?

As long as Relics require only weight 2 to spawn, then yes. With Artifacts, no, because they still need weight 3 to spawn. Like annkri observed, Scythes may need to be more vigilant on Level 2, or we'll have to consider changing the spawn threshold; though like Nseraf observed, "last Scythe wins" makes this much less dangerous to experiment with than it would have been a year ago.

then if they both spawn stuff, when player C puts their tiebreaker trace in, does stuff automatically just disappear?

Yes, anything spawned from a segment that was only added by Player A or Player B would despawn if Player C didn't agree.

And then how does this affect player 3? if they both over and undercolor if a and b have put in drastically different traces?

Player C, acting as a tiebreaker, will get points for agreeing with either Player A or Player B. If Player C's trace is itself wildly different from both of the others (either because none of the three traces agree or because A and B agree but C does not), then they will be considered wrong, but that's not much different from what happens to Player C now.

I suppose you could argue that changing things this way treats TBs as less valuable, increasing confidence in the next two players, but I'm not sure that's bad. It may just be that we've been wrong to imagine that the first trace from an enfranchised player is sufficiently likely to be truthful such that it should be presumed truthful; one way to guarantee a more sufficient likelihood would be to raise the TB/enfranchisement threshold to something excruciatingly high, but even super talented players can experience a 10-15% accuracy drop sometimes, and we can't inhibit cell growth too much. So as we tweak our system, we may have finally discovered that although the ability to TB should still be special, practical limits on its specialness also create a situation where we need to trust the next two players a little bit more than we have been.


#8

is it possible to make sure player 2 is also a player that is over the TB threshold, think that would mean a lot less mergers than if a player with a accuracy on <=70% was player two


#9

Awesome, ty for the clarification. I think in light of your reply that it would be a really really bad idea to ever have lvl 1 cells need less than wt 3 to spawn. There are too many newbies that like to fill in the entire cube and it would be a lot of work on scythes to have to chase those down constantly.

And i suppose that if it becomes too much of an issue in the level 2 cells, y'all could easily raise the spawn weight to 3, i cannot imagine it will slow down growth too horribly much.

I love this idea!!


#10

dunno, i think the possibility of 2 noobs filling out the exact same cube whole is insignificant and it'd make the artifacts grow faster than turtle pace lol. (also new players who haven't done 60 have wt 0.1 lol) (and tbh with lsw and wt 3 reap anymore mergers in new/lsw cells are not a prob, one reap and they're away lol unlike msty carpet mergers lol....freeze plz? :stuck_out_tongue: )

and yeah I like ann's idea as well (and a lot less missing stuff as well lol)


#11

it wouldn't have to be 2 newbies tho. As it stands now in lvl 2 cells, if the first player does a normal trace, and the 2nd player decides to be rude and fills in the entire cube, Everything either player traced will be in consensus, which would mean an entire filled in cube will spawn. At least until it gets traced by a third player, or a scythe gets to it. I think it would be an error at this point to take the wt 3 away from level 1 cubes, especially considering how many new players think the whole point is to fill in the whole thing. And you and I both know that if someone traces fast enough, they can circumvent the disenfranchisement/tb threshold. It just seems like if the weight gets lowered enough, that we will be spending all of our time chasing mergers.


#12

I think so far it would be safe to say that while we wouldn't be requiring a certain accuracy for Player 2, we would not be dropping Level 1 spawn weight to 2 either— that would definitely be unwise. If you folks think Level 2 spawn weight going to 3 wouldn't be a problem, though, that's encouraging in case we needed to consider it.


#13

I wouldn't like that lol. (lv2 thingy)


#14

I dont think it'd be a problem. Id rather have higher weight needed to spawn stuff than be unable to play b/c ppl made so many mergers it crashed the game :stuck_out_tongue:


#15

I'd rather RG all the cells than increase spawn wt in lv2 lol or just not do the player 1 adds and player 2 adds all adds lol, some lost points as player 2/3 is not the end of the world. :stuck_out_tongue:


#16

why do you hate the idea so much? as scythes, it wont affect our reaping spawning stuff, and lvl 2 cells tend to get played more so they should still grow at a similar speed, and may even make retros more likely to be awarded in a reasonable time frame.


#17

What if to combat the inevitable slowness in cell growth/completion in lv2 if spawn wt is increased to wt 3 the wt cap in lv2s is lowered to 3 from 4? +1 player to spawn -1 player overall, and imo player 4 is not required anyways, it spawns at wt 3 4th not required lol, scythes can fix the errors 4th almost never does, if required.


#18

i am also not sure increasing the weight in lvl 2 is a good solution. Lowering the weight cap in both lvls could work for getting the cells finished faster but not if you need wt 3 to spawn because i think you would get too few available cubes. Also i am not sure if that could be a problem for people with wt 0,1 who is playing a lot of the time in wt 3 cubes if i have observed correctly. but like it is now the fourth player are not really adding lot of extra accuracy and can sometimes add mergers after the first scythe have cheeked the cube.
For me getting lower points for some cubes because of bad tb is not really that important but can see it beeing a big issue for new players


#19

0.1 wt ppl are not bound to wt cap, only completed cubes (admin/ 2/2 voted), it's why they can play wt 4 cubes as well as admin wted ones. :slight_smile:


#20

Im not advocating for raising the spawn weight currently, i just think it is a viable option if runaway mergers become an issue with the new system.

I think my concern with lowering weight cap to 3 would be cell availability. We seem to struggle with that at times currently, i would imagine lowering the cap would make that issue worse.