More

zsch · on Sept 6, 2016

This was my weekend project, which doubled as an introduction to the world of Facebook bots and node.js. I open sourced it: https://github.com/zchr/taketen. I'd love any feedback!

zsch · on May 23, 2013

That's how it initially was. But I have now made it so that, as you say, it first checks to see if it's a multiple of 7. If it's a multiple of both 7 and 3 (regardless of whether or not you have played it before), then you will no longer be penalized.

henrybaxter · on May 23, 2013

I think I understand why you decided to do that, but it might be worth an explanation modification. I tested my understanding of the rules by trying 21, and was surprised. Not that it's a big deal!

zsch · on May 23, 2013

I agree. Just updated.

zsch · on May 23, 2013

That makes complete sense. I originally considered making a small noise when the sum was added to the score, though I'm leaning more towards flashing the updated score with a different color. Thank you for the feedback.

zsch · on May 23, 2013

That's a great idea! The trick will be finding a place to display it. Maybe I'll add a button next to the score, and clicking it will reveal an overlay with played numbers.

Sealy · on May 23, 2013

A button would be good, but it would be even better if it showed up as soon as you played it. It would reduce the number of unnecessary clicks then and become more intuitive to play.

jstanley · on May 23, 2013

You could just display the smallest multiple of seven that hasn't been used yet.

zsch · on May 23, 2013

Thank you for pointing that out. No it should be on a game by game basis. It should be fixed now.

zsch · on May 15, 2013

Thank you for the article. Believe me, I am very much aware that this is elementary. I hoped to – but apologize that I didn't – make it clear that making predictions requires incorporating many many factors. It was a fun script for me to code, and I was excited to see it work the night I wrote it. But I also understand that this will need more testing and to be based off of far more information to be anywhere close to accurate.

zsch · on May 15, 2013

I'm sorry it came off that way. I am very aware that the script in its current form does not come remotely close to doing justice given all of the data thats out there and necessary to incorporate. I meant to make it clear in the article that this script is completely elementary – it was a fun thing to code, but nothing remotely resemblant of what it takes to predict sports with accuracy.

zsch · on May 15, 2013

Excellent idea. And it will provide a much faster way to gauge its accuracy as I adjust the script to accomodate more areas of information.

I actually just started that book and so far so excellent.

zsch · on May 15, 2013

yeah for now... They're most definitely a thing, though I have a documents worth of baseball elements I hope to incorporate

jerf · on May 15, 2013

Are they?

    jerf@jerfhom:~$ python
    Python 2.7.3 (default, Sep 26 2012, 21:51:14) 
    [GCC 4.7.2] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import random
    >>> 94.0/(94+68)
    0.5802469135802469
    >>> winp = 94.0/(94+68)
    >>> games = []
    >>> for x in range(50):
    ...     games.append('w' if random.random() < winp else 'L')
    ... 
    >>> ''.join(games)
    'wLwLwwwwwLwLwLwwwLLLwLwwLwLLLwwLwwwwwLwwwLwwLLwLLw'

In my full simulation of 162 games, the longest streak was a 7 game losing streak, despite the higher win percentage. Of course you'll get different results each run; my next run produced a 9 game winning streak, which some quick Googling suggests is in line with what happened in 2010.

Combine this with the fact that real play is not drawn uniformly (you may play a much worse team against which you have a much better win percentage for several games in a row) and I don't see much need for some sort of meaningful, statistically-predictive "streak" to explain game results.

zsch · on May 15, 2013

The 2012 data I used as the basis of my program actually had the same thing you describe – the longest streak was an 8 game losing streak despite having more wins than losses overall.

And I understand exactly where you're coming from. This is very preliminary, and if anything it was good coding practice for me. Though I very much intend to incorporate more significant factors like the lineup, the opposing team, and their history.

jaredmck · on May 15, 2013

First improvement: do this for every team ever. Then combine for all teams, first in an individual season, then try basing the win% iteratively based on more history.

Based on these models, you should have some good examples of selection bias, and see how the model changes based on what you are not testing for, but what is implicit in the data (since data is merely a set of samples of data generated by one iteration of the (unknowable to some degree) true talent functions for each team (player, lineup decision, injury, close call by an ump, etc.)

If you're interested in going down the rabbit hole, there's tons of people who can show the way (and they're nice! At least tangotiger is way nicer than he should be in listening to people who have put no effort in understanding what is good and what is beginner's blind bliss)

Hot and cold streaks are just random variance, so is whether balls are hit within reach of fielders or safely out of reach, given a certain contact quality (ground ball, fly ball, infield pop up, or line drive all have vastly different tendencies to fall for a hit - line drives ~.600-700 babip if I recall, FB ~ low .200ish, GB ~ .300, pop up 0ish?) point is these are all known, to se degree, given the historical data.

If anyone wants to explore this stuff further let me know & I can point you to the right spots to help a specific interest?