Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: MovieChat.org – Archive and Replacement for IMDb Message Boards (moviechat.org)
103 points by JimSmith84 on Feb 15, 2017 | hide | past | favorite | 69 comments


Hi Everyone,

My name’s Jim, and I created MovieChat.org as an archive and replacement for IMDB’s message boards which are shutting down this week. For those of you not familiar, the IMDb message boards allowed you to discuss any single movie or tv show with others (there was a separate forum for each movie/show). IMDb recently announced they were shutting down the message boards and its users were furious (there's a petition with close to 10k signatures here: https://www.ipetitions.com/petition/petition-to-keep-the-imd...). I ventured out to create an archive (of all the existing posts) and replacement and hence MovieChat.org was born.

Key Features of MovieChat.org:

1. Any movie/show on IMDB is also on MovieChat.org (over 4 million and counting) - we have separate boards for each movie/show, just like IMDB

2. I backed up most of the posts for IMDB’s top 10,000 movies/shows - most existing conversations on IMDB should also appear on MovieChat.org - we have over 3 million posts already (and I'm working non-stop to back up even more from IMDB)!

Please visit http://MovieChat.org, join or start a discussion, and let me know what you think. If you like it, please spread the word. If there’s anything I can improve, just email me (jim@moviechat.org) and I’ll get on it.

Jim jim@moviechat.org


This is awesome and much needed.

Just throwing it out there, but would you consider making a dump of the data you scraped that could be used by data scientists? Maybe as a torrent or something like that? Data about movies and what people say about them could form the basis of a lot of NLP projects.

What other big datasets are there for forum post text data? The reddit dataset most immediately comes to mind, and I've also seen a similar one for HN comments. Any others?


ArchiveTeam's web archives will be available to everyone without restrictions or profit as usual.


Well, thanks to your comment, I just found out ArchiveTeam exists.

Thank you!


Where you can download the boards?


Yeah that's definitely something I could do. Are there any proposed projects you know of that could benefit from this type of data?


Everything from simple sentiment analysis, to archive.org, to another mirror. I hope that does not discourage you from releasing the data.

Edit: I see the other comment about archive team already collecting and releasing this data, for free in an open format. I think that will be a good first source as well.


I think the priority was to reenable discussion of new movies and TV shows, that would be useful moving forward. But maybe they could make an API.


I'd love to explore how this data could be used to enhance recommender systems.


AFAIK, Jinni is doing exactly that.


Thank you for this. I'm curious if IMDB considers their (soon to be former) forum posts under copyright, or if you already have a plan in place.

One actually hopes that IMDB doesn't in fact have any claim on posts written by users that happened to be hosted on their site.


From http://www.imdb.com/board/announcement

"...disable our message boards on February 3, 2017 but will leave them open for two additional weeks so that users will have ample time to archive any message board content they'd like to keep for personal use."

Note the "personal use" above. Is a website like MovieChat.org considered personal use?

From http://www.imdb.com/conditions , section Copyright: "The compilation of all content included in or made available through any IMDb Service is the exclusive property of IMDb...".


Yeah that's a good question. Before I became an engineer I practiced IP/copyright law. Also spoke (colloquially) with a few ppl at the company.

I did my homework before starting this project and I'm confident we're in a good position :)


Could you explain why you think you're in a good position? From what I understand, the comments are owned by the users, and by posting them to IMDb, they allowed IMDb and their affiliates the right to use them.

The users did not give you permission to use their data, and you're not associated with IMDb, and therefore do not inherit the rights.

Lastly, IMDb specifically states that scraping or extracting data is forbidden without written consent.


>>Before I became an engineer I practiced IP/copyright law.

I'm sure there's a story there since I would have thought the usual direction, even for a small aggregate number of eventual engineer-patent attorney pairings, would be reversed. Hopefully you can post an "About Us" on MovieChat.org at some point when you feel comfortable. Thanks!


Interesting project? The IMDb message boards were a good way to get some additional insight after watching a movie. Or at least give me a good laugh at the trolls. :)

Who owns this post data? IMDb? The users? Do you have any plans or ideas for existing IMDb users to somehow claim their username on MovieChat, linking to their archived posts?


> The IMDb message boards were a good way to get some additional insight after watching a movie. Or at least give me a good laugh at the trolls. :)

Totally agree. I often also went there before watching a TV show or movie to get an idea what it was about or if I'd like it (though wary of spoiler potential), whereas I almost never read the reviews there, which I find are often far more useless than the message board comments.

After they're gone IMDB will be just a soulless movie listing site, with no reason to visit other than to check a mark out of ten, and no reason to stick around afterwards. :/


Hi thanks for posting, looks like it has some potential.

one item i didnt like is that if i search for a show, i get hits for each episode.

would be good to have one entry per show, and then you can drill down into individual episodes.

also would be good feature on ability to rate shows (as a registered user, but i guess IMDB is still offering that functionality)


So basically reimplement IMDB?


I made a userscript to be a drop-in replacement for the current discussion-teaser on a movies page on imdb, but it gets the discussions from moviechat.org (and links to them directly).

For it to work I would need for Jim to enable Access-Control-Allow-Origin from imdb.com (and ideally make a json/xml version of the thread list since I'm regexp'ing the html at this point).

Maybe it would help persuade him if we showed that there was some interest for this.

https://imma.gr/66880x4ca8b


Hey Jim – thanks for doing that! I'm curious, how did you back up IMDB's message boards? Did you scrape the whole website or were you able to find a data dump?


I wanted to ask this too. Did you scrape it or did it via some api? Anyway, cool project, wish you all the best with it.


illegally scraped,IMDb doesn't have API for this and automated scraping OP done is against their TOS, not mentioning without consent of users, for instance stole 14 years worth of my posts


He stole them? You mean they are gone now? Because the closest I can imagine happening is an unlicensed reproduction of your content, your credentials included. ... Like any archive service would do it.


I take it then, when you posted these 14 years of comments, you didn't want them available to the public to, I don't know, read?! Thank goodness IMDB deleted them all before we happened across them by accident & violated your privacy.


Nice - but have you consent of the comment authors to duplicate the content? Normally, UGC (user generated content) remains under copyright of the users.


no, he doesn't and he illegally copied 14 years of my posts which i don't like especially since I moved to TMDb which is proper movie database where you can also import your IMDb ratings and watchlist and each movie there has own discussions


Great effort. I see some broken images here: http://www.moviechat.org/movies/search?name=westworld

I'm surprised there isn't enough interest in people to recognize tMDB as relevant. Especially since iMDB is ignoring users by taking down the forums.


agreed on second part that TMDb is the only real alternative, it has information about each movie and TV show, discussion for each of them and you can even import there your IMDb ratings and watchlist, plus is run since 2008 and edited by its users plus developer is very responsive and provide API free of charge


Thank you, THANK YOU, THANK YOU!!!!

I just heard about the IMDb message board deletion and was feeling deeply sad about it. I love cinema, and their message board (despite its flaws) was a valuable resource. It was the only place where you could go to participate in in-depth discussions of even the most obscure films. I used it for 14 years and learned so much from other users.

As far as I'm concerned you are a SAINT for preserving this bit of film and internet history. Major gratitude to you for your caring and putting the new site together.

I honestly want to cry, I am so relieved.


thank you for helping further fragmentation of IMDb alternatives by creating dumb forum like tons of other generic forums when there is REAL alternative movie database with discussions for each movie called The Movie database themoviedb.org run since 2008 and edited by its own users like Wikipedia

it's nice you created searchable archive of IMDb discussions though guys at Archive are doing same anyway, not posing as alternative and I am not feeling very happy sometime took 14 years of my IMDb posts and put them on his website without my agreement


You are fully within your legal right (based on most countries data protection laws - I do not know where you are), to reach out to MovieChat.org and request formally your data be deleted. If it bothers you that much, I fully recommend you do so.

However, surely rule #1 of the internet is, once you've posted it, it's out there forever. Are you going to go around to every cache server, and everyone's browser cache and demand that your comments be deleted from there as well ?


should not it be the other way? first he should have consent to publish that data and not that he illegally copy data and publish them and i must perform some action when he did some illegal?

as much as i hate IMDb shutting down boards i hope they will close this illegal website

OTOH i have no problem with recognized Archive.org instead of this solo thief


Please stop your trolling. Nothing is being "stolen".


It's a good start. I mean, you've got the data from the original IMDB boards all laid out and accessible, so that's better than most alternatives at the moment.

However, I'm curious about exactly how you're going to promote this site. The people currently on IMDB likely don't know much about it, and it's unlikely the administration there will redirect people over when they view the pages for a movie or TV show on their site.

What's the plan to get the users whose data you scraped to come to this new domain and participate again?


he doesn't have movie database with ratings and watchlists, so i see no reason moving there when i can have all of this plus discussions at TMDb (yes, they are at the moment quite empty because nobody illegally copied posts from IMDb there, including mine)


why would i participate on dumb backup of database, done without my consent (OP stole 14 years of my posts), when I can go to proper movie database with discussions, imported ratings and watchlist from IMDb, called TMDb (themoviedb.org)


Thanks for making this. Some suggestions:

- How about linking back to the IMDB page for each movie?

- It would be great if you displayed various ratings from IMDB, Rotten Tomatoes and elsewhere kind of like JustWatch.com does (but with links to those ratings pages).

- Actually, it would be really cool if you linked to a JustWatch search too so the user can easily see if the movie is streaming anywhere that they subscribe to.

Another crazy thought - a merger with JustWatch would be awesome too (but I really wish JustWatch would also link back to IMDB, Rotten Tomatoes, etc).


Yeah I was actually thinking of doing this, just been super busy trying to archive everything from IMDB before the boards shut down.

Do you know the founders at JustWatch?


Hey Jim, one of the JustWatch founders here - we'd love to have linkouts as well. Unfortunately, interests with IMDB aren't super aligned at the moment and they're in business much, much longer than we are. Let's say they keep a pretty close guard on their data, as they've spent the last dozen years collecting it.

But rest assured we're working heavily behind the scenes to make our site & UX better and better every day with free and open movie data sources - but always trying to stay safe on the legal side.


Thank you!!

To my knowledge this is the only instance someone has managed to scrape the Imdb boards, very curious how you managed that.

I really hope you get a sponsor/donations. This needs to be promoted to other sites/social media and I will do my part.

I do have one request - can we do this for actors as well?


http://tracker.archiveteam.org/imdb/

You can even easily join in if you want.


Somewhere buried on IMDB is the ability to download the IMDB database of actors/films/directors etc. I did it once, but discovered it's in such mess (schema wise) that it would be easier just scraping the actual IMDB pages I needed[1].

---

[1] To populate my Kodi .nfo files in a way that I was happy with. I know scraping is probably against IMDB t&cs, but as this was for my own use, I took the risk.


promoting illegally copied content of users without their consent? for what's it worth, he copied 14 years of my posts and I don't like it


Take a deep breath.


Imdb also 'copied' all your posts, right? How is this any different?


no, I published my comments on IMDb website under their ToS, so I am fine with them doing whatever they want on their website, I don't like it but it's their right, meanwhile this guy has no rights to this content


Jim-first thank you. But I accidentally signed up under my whole email when I meant to sign up only under the first part as a user name. Can you help? Dini


What most user's annoyed the most is years of profile status now gone. If you could somehow managed to get profile verification in sync that would do wonder.


My university's internet blocks the site as “Pornography”. Was the domain name previously used for some other use, perhaps?


Please add https, now with Let's encrypt there's really no excuse to not have it.


I wish! Windows servers is an excellent excuse to not have it. Two years running now there still isn't a practical solution for Windows servers and none in sight for the foreseeable future.


I totally agree with this.

I am not comfortable running a daily, task scheduler based, third party script with local admin rights, on my Windows Server, that checks my Lets Encrypt certs and auto renews them.

However, now that Chrome v56+ is revoking StartSSL.com certs, I've probably got no choice. Either that, or actually PAY for a multi-domain cert (Comodo do a fairly cheap one).


> third party script with local admin rights

Why? Just have it drop the certs in some directory and then reload the services. Just give the account running the script permissions to only reload services.

I assume such is possible on Windows, but I don't know for sure because I only use Linux servers. It is trivial there, so I assume you can do it on Windows as well.


It's not unfortunately - Certs are held in the Computer partition of the Windows Credential Store, which you need elevated rights to update. It sucks, but that's how it's designed. You also need to re-associate the renewed cert (once its in the credential store) with the IIS binding as well, and then you can stop/start the website instance. Again also needing elevated rights.


Serving web content on Windows is not a good excuse for anything. Move it to linux or BSD.


I don't get it. Are you talking about SSL on Windows in general, or Let's Encrypt on Windows in particular? Because the former is absolutely possible, and not that hard to add to IIS.


Let's Encrypt can still be pretty convoluted and is not headache-free, even using the officially anointed tools.

Despite your comment and the (well-intentioned) plans from browser vendors to do what they can to squash unencrypted HTTP, failure to use HTTPS, even with Let's Encrypt, is still a totally forgivable sin today.


I disagree, by just running https://caddyserver.com/ you would automatically have a https site up in seconds and it can be used as a reverse proxy too.

With nginx:

- `certbot certonly`

- press `2`

- type in your domain name

- press return, done

Add a few lines to your nginx config, done.

``` ssl_certificate /etc/letsencrypt/live/<yourdomain>/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/<yourdomain>/privkey.pem; ssl on; ```


Just to confirm, do you think "swap out your webserver for something else" is a reasonable demand?

> `certbot certonly`

You make an assumption that this is going to work with no hiccups, across all environments. Spoiler: it doesn't. I specifically alluded to certbot and certbot-auto being (surprisingly) rough around the edges.


I'm not demanding anything, I'm just listing some options and emailed the OP to offer my help.


Look, you entered a conversation where the context was that there is "no excuse" not to be using TLS today. To agree means to place demands on the people operating the endpoints where HTTPS is not yet rolled out. Your response was that these operators can "just run" their websites with TLS by swapping out their backend. You either agree that's a reasonable demand, or you don't.

Now, it is possible that you don't agree with that specific demand, and that something that involves switching the backend is merely one viable option—it's sufficient and not necessary to achieve that goal. And that's fine. But as someone who showed up to throw in their support for the claim that it's inexcusable not to be using TLS today, then the burden reverts back to you to justify the claim.

So, I ask you, as someone on record as disagreeing that it's still forgivable to be running an HTTP-only site in 2017: what are, as you see it, the minimum reasonable demands to be placed on someone operating a website?


I'm not sure why you even want to discuss the details of my simple suggestion or "demand" at length like that...

But to answer your question: I think if you are running a message board / forum for people to discuss various topics in general you should try to keep your users as safe as possible within your means. That means https, no plain text passwords in the database - basic stuff really.

PS: The DNS on your website as linked in the profile isn't set. Only the www. subdomain works.


> I think if you are running a message board[...]

Are we talking about the suitability of operating any unencrypted endpoint without TLS, or are we limiting ourselves to message boards? If the latter, does that mean there are forgivable reasons to run HTTP-only applications that aren't message boards?

> The DNS on your website as linked in the profile isn't set

DNS is set up, it just doesn't have an A record or CNAME for the bare (second level) domain. That's intentional.

> Only the www. subdomain works

The other subdomains are certainly working.

> I'm not sure why you even want to discuss the details of my simple suggestion or "demand" at length like that

Because in response to this:

> failure to use HTTPS[...] is still a totally forgivable sin today

You said this:

> I disagree


yeah but that's a webserver almost no one has heard of and may not even be in a position to switch to


I'd say it's pretty well known if you have anything to do with Go but there's a simple tutorial for most web servers. If you are able to build a site like that, editing some web server configs by following a tutorial shouldn't be a problem.

In any case, I like the project but I think that in 2017 it's irresponsible to do logins and user registration over http - even if it's just a weekend project.


theimdbforum.com


Holy hell that's a ton of data you scraped


Second request-Help! I was active on IMDb under dini0519 but am now having trouble signing on under that name because I had previously signed up under my whole email not realizing that would be the username. Not really wanting my complete email floating around I tried to shorten my user name to the previous dini0519 and even just "dini" but nothing is working. I really want to be part of this community esp concerning the Twilight Zone boards, The Golden Girls, and Big Brother. Please help.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: