Forum


Skrumgaer on Ratings--Use Data We Already Have
skrumgaer wrote
at 5:56 PM, Wednesday February 21, 2007 EST
“I should know, my rating is 1463.� --skrumgaer

In regard to ways to encourage the top players to keep playing games, Ryan has already given us data we can use to do it. Use each player's place profile (what percent first place, what percent second place, etc.) and the number of games played. We know this for every player.

This is the way to do it. Do a goodness-of-fit test of the player’s actual profile against that which would be expected if luck, and nothing else, determined how players placed in the game. (For background, see Wikipedia on “goodness of fit�.) If luck and nothing else governed the game, the typical profile would be 14-14-14-14-14-14-14. But with skill, a player would hope to increase the numbers on the left and reduce the ones on the right.

I will use my own profile as an example. As of 6:15 p.m. on February 21, I have played 96 games and my profile is 9-11-16-22-16-14-13. The goodness of fit takes the square of the difference between my actual and expected percentage, divided by the expected percentage, summed over all games. I call this the Test Against Plain Luck (TAPL). With these numbers, my TAPL is (9-14)squared/14+(11-14)squared/14+(16-14)squared/14, ….. +(13-14)squared/14 which comes out to 8.00, which when multiplied by my games played (96) come out to 768 points.

Plain luck is not very tough to beat; even negative skill can do it. So I offer some other tests as well:

Test Against Own Worst Rating (TAOWR): instead of using 14-14-14-14-14-14-14 as the expected score, use the profile from your lowest rating.

Test Against Own Last 1500 (TAOL1500). As your rating goes up and down, you will pass 1500 periodically. Each time you are at, or close to, 1500, update the expected percentages with your current profile. This will toughen the challenge each time you do it. You can also have Test Against Own Last 1600, 1700, etc.

Test Against Known Bad Example (TAKBE). For this, use as the expected scores the profile of whoever it was who did the Phoenix Challenge. This will result in larger numbers for everyone since they are measuring their performance against pure negative skill.

The advantage of the TAPL is that for a given profile, scores increase as more games are played. Players who sit on their high TAPL’s will eventually be overtaken by those who play more games.

The TAPL is easy to set up on a spreadsheet. You can copy the number of games played and seven percentages right off the player’s profile page and use Paste Special (Unicode Text or Plain Text, but not HTML) to put it into the spreadsheet.

I have calculated the TAPL’s for the top 5 players and got the following:

NeKo839: 50 games, 30-18-18-12-14-04-04, TAPL = 1757

Hatty: 104 games, 24-15-16-22-06-06-08, TAPL = 2474

Dassault: 82 games, 24-07-18-14-18-07-09, TAPL = 1494

tzisc 129 games, 26-15-10-13-15-09-09, TAPL = 1963

kwizatz 70 games, 20-18-17-07-14-10-12, TAPL = 650

NeKo839 has a strong showing but has played only 50 games. No one seems to do better than Hatty in taking losing games and making something out of them. On the other hand, kwizatz has not learned to do so and has a lower TAPL as a result.


Replies 1 - 10 of 11 Next › Last »
the brain wrote
at 7:09 PM, Wednesday February 21, 2007 EST
Simple counterproof that this actually is a sound rating system:
10-10-10-10-10-10-40 equals 40-10-10-10-10-10-10. The simple reason for this is that the formula you gave calculates an error from the expectation, which is impartial to whether it's positive or negative.

I can give only one more comment about these sort of suggestions as I've seen more of them. If you DO suggest something like this (totally different from the current rating system), test it! Simulate statistically better players against worse players and look at the ordering your rating gives. Or at least come with some sort of proof that it's sound (it took me maybe a minute to come up with this counterexample, if you gave it some thought before writing this you could've saved the time it took you to write this).

But even better, come with suggestions for improvement within the current rating system. I find it hard to imagine Ryan would even consider changing the rating system so drastically as to such as this.
XicaDaSilva wrote
at 7:21 PM, Wednesday February 21, 2007 EST
@the barin
excellent point

on the other hand the phoennix people will probably argue that requires a lot of skills to finish 7th 40% of the times.
skrumgaer wrote
at 8:39 PM, Wednesday February 21, 2007 EST
Brain: I said that negative skill would beat plain luck. That is why I mentioned calibration with the other tests, especially the TAKBE. Also, this is a supplement to, not a replacement of, the current rating system.

--skrum
no Wolf wrote
at 8:53 PM, Wednesday February 21, 2007 EST
Those 'r some fancy mathamatations. What precisely does the number tell us, however?
Coffee_Time wrote
at 9:02 PM, Wednesday February 21, 2007 EST
Skrumgaer

Your scores are nice and simple. At the same time they only use the player's data. Since the ratings are dependent on who you played, I'm not sure any comparison (while neat) is valid.

A rule of thumb I've been using (which again assumes independence) is to find the percentage of games I've been placed 1st to 3rd and 1st to 4th. If the first number is above 50% then I'm doing well. If it is not and the secon number is then I'm just plain average.




They are independent of the the other players ratings, the very factor that depends o
Grunvagrr wrote
at 10:00 PM, Wednesday February 21, 2007 EST
A major flaw overlooked is that win %'s do not equate to the talen of players who are playing.

When I play vs 1500 players I indeed tend to win a crazy amount of games, thus I proceed to 1600s.

When I play vs 1600 players I indeed tend to win a large amount of games, thus I proceed to the next level.

Eventually I reach the cap.
Point being, winning vs people who do not know the nuances and tactics of this game (yes it's not all luck) is a lot easier due to their rating.

the top 100 tactically take better risks etc,
tis no surprise to me that after this score reset a preponderance of the same faces have risen to the top of the game.


In a nutshell, its easier to beat 1500 players than 1900 players.
I could go beat people who (go 'away' often, dont know the rules yet, are playing their first games, etc) and inflate my win %'s

The current system rewards players for winning vs QUALITY opposition.

I rest my case because I am a beast at kdice.
no Wolf wrote
at 10:30 PM, Wednesday February 21, 2007 EST
I find it more troublesome to play newbies. They do the craziest things. It's like a chimpanzee with a gun, you can't predict it...
the brain wrote
at 3:29 AM, Thursday February 22, 2007 EST
"Brain: I said that negative skill would beat plain luck. That is why I mentioned calibration with the other tests, especially the TAKBE."
You obviously failed to see my point.
again: 10-10-10-10-10-10-40 equals 40-10-10-10-10-10-10
or: 'very bad player' equals 'very good player' in score ( (40-14)^2/14 + 6*(10-14)^2/14 == 6*(10-14)^2/14 + (40-14)^2/14 ).

Changing the scores you test against makes little improvement to this, there will always be a lower score (if not 7th, then 4th, 5th or 6th, for example) to raise instead of 1st/2nd, because it makes no difference.


"Also, this is a supplement to, not a replacement of, the current rating system."
Please, do explain where this would fit in, as I fail to see it.
skrumgaer wrote
at 8:58 AM, Thursday February 22, 2007 EST
To The Brain

Skill is skill. A negative skill (like playing lowball poker)is as much of a skill as a positive skill. In kdice, there may be several skills, such as correct analysis of the odds (that may contribute to more first place finishes), "trucing" (leading to more third or fourth place finishes), or lowballing (like the Phoenix Challenge. The TAPL captures all skills. Who are you to judge which skills should be counted and which not?

To Grunvaggr:

If Ryan gave us a percentage distribution of percentages of points earned or lost (from +40 to -40 or whatever) a score could be computed from these (it would be a sum of 91 numbers, not 7 numbers) that would measure quality of opposition, but since Ryan has not given us these numbers, we have to do with what we have.

To The Brain

All the tests other than TAPL are calibrated to scores with known Ryan ratings, hence the tests are a supplement to, not a replacement of, the current rating system.

--skrum
the brain wrote
at 9:37 AM, Thursday February 22, 2007 EST
"Who are you to judge which skills should be counted and which not?"
I think it's fairly safe to say that the majority of people interested in ratings will agree that a player getting a lot of 1st's is a better player than a player getting a lot of 7th's, and want to see ratings that reflect this.
Try finding something like a chess tournament (or any game for that matter) where someone who loses the quickest will have equal ratings compared to someone who plays strategically.

"If Ryan gave us a percentage distribution of percentages of points earned or lost (from +40 to -40 or whatever) a score could be computed from these (it would be a sum of 91 numbers, not 7 numbers) that would measure quality of opposition, but since Ryan has not given us these numbers, we have to do with what we have."
http://en.wikipedia.org/wiki/Elo_rating


"All the tests other than TAPL are calibrated to scores with known Ryan ratings, hence the tests are a supplement to, not a replacement of, the current rating system."
You calibrate against percentages, which have nothing to do with the current elo rating system.
And, a supplement would be incorporating some new element into the current scoring (say for sake of an example, adding the number of games played to the elo rating), not coming up with a completely different number unrelated to the current ratings.
KDice - Multiplayer Dice War
KDice is a multiplayer strategy online game played in monthly competitions. It's like Risk. The goal is to win every territory on the map.
CREATED BY RYAN © 2006
RECOMMEND
GAMES
GPokr
Texas Holdem Poker
KDice
Online Strategy
XSketch
Online Pictionary