The creator of HaveIBeenPwned clarified on Twitter that there's no plans to add "search by phone number," so odds are that, if your data is in this leak, you won't see it through HaveIBeenPwned.
>500m Facebook records were leaked and <10m records have an email address — far more records are phone number but no email.
Phone numbers are only 10 digits. That is only 10 billion combinations. For SHA1 which pwned passwords uses that it should only take a couple seconds on a decent system. Even with SHA256 it could be done in like a minute.
I believe this has been limited more recently. I just put my number in there and it just says that I can click OK to text myself a password reset, but didn't leak any details on screen.
Serious question: the initial leak misspelt "Austria" as "Austriaia", and chaos has ensued... the initial forum leak along with all ufile references and telegram groups I've been able to find all seem to have this "bug".
So I'm very interested about any insight you may have.
I don't have much insight on this. I accidentally downloaded that one first when I meant to download the Australian one, but quickly realised my mistake because the CSV file inside the archive was correctly spelt. Then I got the actual Australian one.
> That's because it's the austrian dump file. There's an update with the australian dump in the original raidforums thread, it's just missing from the pastebin index
> they were mixed as one file. leaker was australian
--
I obviously have no citations of my own for this anecdata.
Regarding the first post, I went through approximately ~80 pages of "thanks", and one pair of mildly sore eyes later, was unable to find any "update". A torrent *was* posted somewhere around page 80 but this had the wrongly-named file in it. I have also seen a few TG groups that have renamed the file, but it's still the wrong one (like the GP described).
I can confirm that Austraiaia has an eclectic mix of Australia and Austria though - and also that grepping for 'Australia.*Austria\|Austria.*Australia' returns 170 results, grepping for 'Austria' returns 446375 results, and grepping for 'Australia' returns 408 results.
FWIW I'm still looking myself. If you see this, feel free to email me (you don't have an email in your profile), and I can let you know if I turn anything up.
I don't know what there is to do other than to be aware that the combination of your name and number (and maybe some other details) are not private, so there's no reason to trust anyone just because they know those things.
As far as seeing what was leaked, you could find the data yourself (but I'm uncomfortable giving instruction on how to get it). It would be nice if it was possible to extend the tool to be able to send the information for a number to the number (because that's the only way I can think of that demonstrates ownership of the data) but that can't be done for free and it therefore seems too complicated to set up.
> The home page of that website is terrible, you claim to be a website for news but it’s really just far left propaganda, interspersed with communist agenda items
I see a post from Glenn Greenwald. I think most would consider him right leaning. I also see a post from The Dispatch, which Wikipedia describes as "center-right".
Glenn Greenwald is right leaning? I disagree with your appraisal that most would consider him that. He's highly critical of mainstream liberal media which creates the sense he must be 'on the other side,' but he's definitely a left liberal.
I think there's a failure of communication here because Americans[0] tend to conflate "liberal" with "left" although liberalism is not at all leftist and is far more compatible with right-wing ideologies than leftism.
In the most general sense left-wing ideologies are about abolishing hierarchies whereas right-wing ideologies are about either sustaining or enacting them. The difference between liberalism and conservatism is that the former promotes hierarchies that follow from property rights and the latter infers them from tradition (e.g. God's will or "natural order").
Just like anarchists and state communists can have leftist infighting it's possible for there to be infighting on the right. As the US has literally spent decades fighting leftist ideologies, the predominant political discourse in the US takes place between different flavors of the right.
No major media outlet in the US will promote abolishing property ownership or collectivising the means of production. When we talk about "leftism" in US media this usually refers to socially progressive ideas like public healthcare, individualist attitudes to anti-racism, LGBTQIA+ rights and so on. While these ideas naturally follow from most leftist ideologies, they're not addressing the underlying social hierarchies or systemic criticism. Case in point: abolishing the police is a systemic critique if taken literally, but in the public discourse this has been reduced to police reform, which is about tweaking the system, not replacing or abolishing it.
[0] While this is not unique to the US, I'm focusing on the US here because of the original context. There are more discussions to be had about how this presents in other countries and how the Cold War affected those but this is not the time.
It might be a good summary to say that Greenwald is intensely liberal in his stance on civil rights, so much so that he doesn't fit on the American liberal-conservative spectrum on this issue. He's willing to compromise with both liberals and conservatives to help get his message out. His overall stance is not left wing, because most of his other views (at least the stuff he emphasizes) are more (American) libertarian or liberal/conservative, not genuinely left wing.
I guess you're right that "most" was overly strong. But reading his twitter I can't help but think of him as anything but right leaning.
Just today he's got a tweet about "Russiagate"[1], one criticizing "British centrists and liberals"[2], one with him being interviewed positively on Fox News[3], two criticizing Hunter Biden[4][5], one criticizing "House Democrats' HR-1 bill"[6], one criticizing vaccine passports[7].
Maybe it says something about the news and your perceptions that someone who entirely agrees with the progressive social and political agenda is so critical of what is currently passing for progressive ideas and champions these days.
Sorry, I accidentally posted my comment before I finished it. Now I finished it with an edit.
To your point, his tweets seem to be entirely criticizing liberals; I don't see him criticizing conservatives. I'm not saying criticizing liberals is bad or that he should be criticizing conservatives. I'm saying that someone who spends all their time criticizing liberals and repeating right-leaning talking points is right leaning.
You can be critical of your own tribe without being on the other side. In fact in some ways it's more effective as it can shine a light on the crazies in your own tribe so that the moderates are seen as better.
Aaron Maté, who made the actual tweet about "Russiagate" that Greenwald merely made a passing reply to, is about as solid in his traditional left-wing bonafides as anyone can be, following in the footsteps of his father Gabor, a medical practitioner, researcher and author with a decades-long public profile espousing traditional left-wing values.
Aaron Maté's skepticism of the Russia narrative is not out of any support for Trump whatsoever - he's been highly critical of Trump - but is part of a track record of earnest investigative reporting about Russia that few other western journalists can match. One should never forget that as recently as 2012, to consider Russia any kind of threat to the US, which the Romney Republican campaign did back then, attracted mockery and derision from Democrats, and it was only in early 2017, just 4 years ago, that it suddenly flipped such that expressing skepticism about the influence of Russia in US politics suddenly became part of "right-leaning talking points".
All Maté is doing is remaining consistent with what has been the accepted Democrat/left-wing position for many years, right up until Clinton lost the unlosable election.
When Maté and Greenwald spend their time "criticizing liberals", their criticisms are always focused on the modern-day Democrats' track record of becoming too entangled with the big-corporate and military/intelligence establishment, and in doing so, abandoning their supposed left-wing ideals.
Just because some right-wingers criticise the Democrats/liberals for the same reason, does not make Maté and Greenwald right-leaning; they are simply staying true to the left-wing values they've held for decades.
Thanks for the comment. I see now it was more complicated than I originally thought. I don't think Greenwald fits cleanly into the US political alignment, and I think that's a good thing. Blind partisanship alignment often causes a lot of bickering and anger toward the other side.
Well Greenwald has been posting some antimigration articles messages using language that skims very close to hard-right wording. Maybe he doesn't neatly fit into the (American) left/right schema, but by now he is mainly just a "bad media" propagandist.
Also I can't take anyone seriously who claims to criticise mainstream media (almost always the more Liberal slanted one), but goes regularly on Tucker Carlsson without ever seriously criticising them. To see how to properly criticise Fox and Tucker search for the leaked interview with rudger bergman.
> Greenwald has been posting some antimigration articles messages using language that skims very close to hard-right wording
That’s an inflammatory claim, that needs direct quotes/links to be taken seriously. But lest it be forgotten that there has long been an acceptance among traditional left-wingers (including Bernie until it became unacceptable to too many of his followers) of the need for some immigration controls in order to protect the most vulnerable workers, and historically the loudest voices for open borders are Laissez-faire/social-Darwinist libertarians, most notably the Kochs.
> Maybe he doesn't neatly fit into the (American) left/right schema
I think this is quite false: he fits into the traditional left/right spectrum as a bona-fide leftist. It’s the mainstream Democratic Party that has moved towards corporatism and militarism.
> regularly on Tucker Carlsson without ever seriously criticising them
What would it change for him to criticize them; to try to impress the people who would still hate him anyway? Carlson gives him airtime and the opportunity to express his position to an audience of people whose minds he might be able to change about a few things. Why waste that opportunity with token criticism?
>> Greenwald has been posting some antimigration articles messages using language that skims very close to hard-right wording
>That’s an inflammatory claim, that needs direct quotes/links to be taken seriously. But lest it be forgotten that there has long been an acceptance among traditional left-wingers (including Bernie until it became unacceptable to too many of his followers) of the need for some immigration controls in order to protect the most vulnerable workers, and historically the loudest voices for open borders are Laissez-faire/social-Darwinist libertarians, most notably the Kochs.
Maybe in the context of what counts as left in the US. But the socialist left movement was always an international movement (I give you three guesses what the anthem is called).
Regarding some of the Greenwalds words:
"Current illegal immigration – whereby unmanageably endless hordes of people pour over the border in numbers far too large to assimilate, and who consequently have no need, motivation or ability to assimilate – renders impossible the preservation of any national identity"
http://glenngreenwald.blogspot.com/2005/12/yelling-racist-as...
Calling immigrants "hordes" is quite definitely language that is what I consider "hard-right". It dehumanises people and tries to instil fear and a violent counter-reaction.
>> Maybe he doesn't neatly fit into the (American) left/right schema
>I think this is quite false: he fits into the traditional left/right spectrum as a bona-fide leftist. It’s the mainstream Democratic Party that has moved towards corporatism and militarism.
The mainstream democratic party was never not corporatist and militaristic. There has not been a change. However, I'm interested what in your eye's qualifies Greenwald as a bona-fide leftist (whatever that means)
>> regularly on Tucker Carlsson without ever seriously criticising them
> What would it change for him to criticize them; to try to impress the people who would still hate him anyway? Carlson gives him airtime and the opportunity to express his position to an audience of people whose minds he might be able to change about a few things. Why waste that opportunity with token criticism?
Funny, he does criticise the NYT and many other media outlets though. But you're not wrong, he can express his position to an audience, but he's not trying to change minds, he actually agrees with much that Tucker Carlson says. That means he agrees with the agenda of a billionaire who has been using his media to clearly push a neocon, hard-right agenda worldwide. Considering that Greenwald is somehow building this persona of a media critic who points out the "agenda" of mainstream media, it is somewhat contradictory to go on those shows without a word of criticism and without blinking an eye.
- Those words about immigration were written in 2005. At worst they were unkind, hardly “hard right”, but he has since specifically disavowed those words [1][2]. Besides, your initial claim was that he was anti-immigration, which he has never been, just concerned about uncontrolled immigration, which is not inconsistent with being of the left, given that even Marx and other communists have wrestled with the pros and cons of immigration and issues around assimilation and solidarity [3][4][5].
- Greenwald has a long record of support for civil liberties, opposing wars and US militarism/interventionism, opposing Israel’s conduct towards the Palestinians, campaigning for animal rights, supporting workers’ rights and unions, opposing “neoliberal” economics and corporate power, supporting universal health care and welfare. Which of these positions are not consistent with the left? And what actual policies has he supported in the past 15 years that are right wing?
- He criticizes NYT and other mainstream media outlets for posturing in support of token liberal issues but ultimately promoting the interests of the corporate and military/intelligence establishments, which it can be reasonably argued, results in far greater real-world negative consequences than anything Tucker Carlson does. I expect you’ll disagree with that, which you’re entitled to do, but simply going on a conservative commentator’s show does not make you “right wing” when you have an at-least 15-year record of advocating for many clear-cut left wing positions and zero clear-cut right wing positions.
Similar to my reply a bit further down, Marx (and the view of many of the socialists at the time) was not anti-immigration, or anti-immigrant.
He (they) did not blame the immigrants for coming and in fact did not see anti-immigration laws as the solution (arguably having illegal immigrants leads to even more exploitation), but instead saw international organisation of workers as the solution. Your reference [3] says that and there are quite a few more writings from the time along similar lines (I will try to find the references later).
I don't think any of this is counter to what Greenwald, I or a great many moderate voices would contend about immigration in the modern world. If pointing out that immigration has tradeoffs and requires some degree of organisation and solidarity among workers was not bigoted when Marx and other early communists said it, it is not bigoted to say similar things today.
If positions are rooted in hatred based on race or class, of course that's different, but Greenwald has said nothing of the sort, and all the positions he's expressed in the past 15 years make it clear he doesn't think anything like that.
That "hordes" phrasing is at the center of it. And Greenwalds blogpost is all about national identity, something Marx did not write about and actually blamed as part of the problem.
You can either not write about something OR blame it as part of the problem, but not BOTH ?!?
In [3] :
> Marx did not elaborate on his reasons for writing that Irish immigration reduced English workers’ wages. He implied that the cause was an oversupply of manual laborers, but his other statements indicate that he considered English xenophobia and the resulting antagonism among workers an even greater problem.
[...]
> In his 1870 letter, Marx described what he then considered the overriding priority for labor organizing in England: “to make the English workers realize that for them the national emancipation of Ireland is not a question of abstract justice or humanitarian sentiment but the first condition of their own social emancipation.”
That's a pretty bold statement, you have any sources to back that up?
The first international was about uniting international workers, there are in fact several writings who say that the bourgeoisie use nationalism and anti-immigrant sentiments to divide the working class.
you said, quote:
"The first communist international was mainly about antiimmigration."
now the citation you give is they _amongst other things_ wanted to prevent the import of foreign workers to break strikes.
Apart from "amongst other things" does not make it "the main thing", it also was very much not anti-immigration. The international workingmen association formation was about forming an international organisation to prevent "capitalists" from playing the workers in different countries against each other. In other words they realised that unless workers organise internationally, they would always be on the back foot.
Anyone who is seen as critical of modern woke ideology is labelled "right wing". I read a post on HN the other day labelling the current British conservative party as "extreme right". These people must live in a Twitter bubble or something.
I found it on some sketchy download site that I had to use jdownloader for and solve many captchas based on a tip from a past HN story.
After that, you'll find that the data is poorly and inconsistently encoded (lots of ugly BOMs), a bunch of files are split in weird ways (that I had to concatenate then give sensible names to). Figuring out what order they go in (they're not all split on a record boundary!) may take some translation, too. After that, a bunch of them have bad, broken CVS headers and you have to find something that can manage huge files to edit the first couple of lines.
Then you find that all of them use different delimiters (colon or comma), some of them quote each entry, and they each have a different set of data that doesn't match up at all. I'm still trying to figure out what the data in each file is and see if it can be normalized into one schema.
So it's about like downloading a phonebook for half the world with some extra mystery data that has no description where you have to guess what it even means.
Interestingly, I checked for a family member who deleted their FB a few years ago and they're not in there. It'd be interesting to see if there are any accounts that deleted before 2019 in the data or not. I suspect not, but this isn't enough of a test to prove it.
really? i only checked one of the country zips but it was 6 colon-delimited txt files that were easy to grep to see if any numbers i knew were in there
They're all over the place. There are several dozen files, each named after a country. Some contain a single text file that's normal, others have different delimiters, encodings and other nonsense. The last record of some of the split files is split across files, etc.
This is not particularly high quality data, I'm sure a lot of people didn't look at the data very much and just ignored a few lines of errors on import or did simple greps as you say because it's a pain in the rear to find editors that even can handle multi-GB files, etc. Even good converters like iconv can bomb out on, I think it was Qatar (which was also little endian UTF-16 with a BOM), etc.
But yes, it's all over the place. I love how one of them decided to number things 1, 2, III, 4, though maybe that was the translator, I dunno, I never learned to read Arabic letters...
I haven't seen one or it would've saved me a ton of trouble. I think it might be quite a while before I finish putting this into an sqlite database given all the schema nonsense and the weekend being almost over now.
Only public profile data was leaked, and your phone number if you've enabled looking up your profile by phone number. Facebook's privacy checkup flow will walk you through these settings: https://www.facebook.com/privacy/checkup
I'm pretty sure everybody was 'opted-in' to this feature by default. Afaik I have never actively enabled it and yet I'm in the leak. I provided FB my phone number for 2FA purposes. I'm dumbfounded if they actually allowed people to look up my profile with my 2FA info.
Every privacy offending feature on FB is 100% by default opt-out. This is their ethos. When you gave them your number, you gave them a way to not only permanently & consistenly identify you but also let others "discover" you/r profile aka reason for this leak. Using your number as 2FA auth is just a side bonus.
To clarify: my reading of the parent posting was that "by default opt-out" meaning the user is opted-out by default, which is contrary to my understanding privacy-eroding features of Facebook are turned-on (i.e. "default to opt-in").
On the flip side, this data is already public. (its a leak). And by HIBP exposing it, the cost of selling this data goes down, playing the long game to help us.
By HIBP locking down this data, you get a false sense of security. (it is still public info, just only in sketchier places), and there becomes a higher market for selling it.
Interestingly, Russia files are missing huge range of phone codes, from +7(910) to +7(929).
There are 3 top mobile operators in Russia: MTS, Beeline and Megaphone. Each of them has its own set of phone codes. Majority of MTS and better half of Megaphone numbers are not affected, but all Beeline codes are exposed.
One thing that really keeps annoying me about HaveIBeenPwned is the fact that I can enter anyone's email and get their status immediately. This way I can anonymously check if that email had an account at a breached site at the time of the breach. The obvious solution would be sending out a link via email before results can be looked up, but this might not be in HaveIBeenPwned's interest.
This isn't universally true, the site does have a notion of "sensitive breaches" which will not be visible to someone who cannot confirm ownership of the email address.
Some kind of "prove it's you" function would be great.
When you register for notifications it sends out a verification email. That would be a good time to let you disable public lookup of your email address.
I also wish HIBP could securely disclose the snippet of information from a leak that's relevant to you. Knowing the password or hash characteristics, phone number, etc. could aid in mitigation, and seeing the raw impact might help motivate ordinary users to improve their security hygiene.
Suggested that idea to Troy in the past, and got the impression he's not amenable, largely due to the risks of hosting PII. Can't really blame him.
For anyone else who's hesitant to use the link, these are the three choices presented when you opt out: https://i.imgur.com/4lB2bSq.png
If you pick the first one, you can still use the notification service (https://haveibeenpwned.com/NotifyMe) to email yourself a private link to check what breaches include the address (past and current).
It’s not HIBP that caused that data leak. The fact of which emails have accounts on which sites is public following the breach/leak of that site’s data.
Yes, but you'd actually have to look for or even buy the leaks themselves in order to find out if somebody has been "pwned". I'm not arguing against the service in general, but there's an obvious way to improve privacy.
It seems a bit like the FBI warning at the beginning of DVDs, the only people that are going to be bothered by it are people that were going to play by the rules in the first place. The privacy was already lost, you can't increase it again retroactively.
The ones marked sensitive have more a pattern of "we don't want to be caught in a court caste about hosting the info" than a pattern of trying to improve privacy in the ways this thread is suggesting.
what if I use HIBP to check if any of my friends had registered accounts on a Furry Findom forum, which was recently breached? It's potentially embarrassing.
On the other hand, that means I don't have to do an annoying verification dance to look up my pwnages, and can help others look up their breach information without having to explain to them over the phone how to click a link in an e-mail.
Yes then they could also show the actual data leaked. I'm supposedly in 11 breaches that include things like my date of birth and physical address. Maybe. But I don't really know.
Did I give a fake DOB. Is it a previous address? Do they actually even have an address or is it just NULL for me? HIBP won't tell me.
The point of hibp is this information is now public. Putting it behind some kind of identity check would be security by obscurity. The bad guys have the raw data and can look up any address at will.
Public isn't "binary". Finding the leaked files, downloading them (often paying to do so), aggregating all the data from all of them and searching them all is a huge pain-in-the-butt.
99.99% of people would not go through this effort and lack the skill/knowledge, even if they were motivated against someone.
As a small business owner, there are a LOT of hateful people out there that would gladly do harm if they had more skills - thankfully they are also incompetent.
But only due to HIBP integration in browser's password manager many come to know about their account leak even when the company which was breached didn't disclose it (as is the case in most countries).
There's still a need gap to detect leaked file data online, Say after a ransomware attack our files end-up in pastebin[1]; currently there seems to be no way to know unless manually monitoring sites where leaked data is posted or for the attacker to themselves let you know.
The percentage of breaches in this database which is “Mongdo DB instance exposed to internet without a password”... yikes. Why on earth is no password the default?
found it a minute later on btdb. The data is really not interesting. It's like a 2021 version of a phonebook, that was common when I was younger. Where it becomes a slight bit dangerous, is that names and sometimes birthdates and emails are linked to it.
username and SHA/MD5 password dumps are more interesting to analyze though.
Indeed, people assume haveibeenpwned is trustworthy when it seems to be a centralised place of valid emails from people that care about security and thus might have or control something of value?
I mean hibp is run by Tory hunt who has a good record as security researcher. So yes it's a centralised place but all it does is aggregate breaches and makes them searchable (to a degree, and not in a way that is useful for nefarious users). The breaches are not shared but those breaches are out there and nefarious users don't need hibp to get access to this data. It is really only useful to end users who want to see if they are at risk.
We don't know what that server is running. If it keeps a log of all the queries then it has a pretty nice list of emails from people that might make good targets.
Go read up about HIBP and Troy Hunt. It doesn't log requests, it tries to make it as difficult as possible for nefarious users to get any data from it. All the emails it checks are already included in public breaches which are available in the wild.
You seem to be throwing shade at a service and person you haven't researched and all behind a new account. Very brave of you.
I honestly don't care who this guy is. A centralised server providing this service is a problem. It's not like he is using some magic decentralised thing to run queries so we have mathematical proof nobody can aggregate data. No, it's a server he alone has root running whatever software.
The notion itself of revealing info about you to a 3rd party in order to verify that more info hasn't been leaked seems... conflictual at some level.
Your account presumably isn't new. Are you any braver?
Given that Troy is quite knowledgeable about how breaches happen, if HIBP does get breached, it will likely be due to targeted hacking rather than negligence.
It's hard to say definitively how old this data is due to this being a partial breach, but at one time I had two separate login email addresses for facebook.
Email address A was a gmail address and is a single dictionary word. I moved away from it as a login email due to the tendency of people to blindly spam it and it being used as a throwaway email address when people sign up for accounts. At this point I've had to purge a half dozen accounts people created on Facebook using my gmail address. The most recent one was created 4 months ago. Due to a rapid succession of logins (South Africa and United States) geometrically far apart, it was flagged and disabled.
I started using a different email address for my Facebook login a little less than 2 years ago. It is a custom domain name.
Neither showed up in the Have I Been Pwned lookup tool under this Facebook breech.
I just searched myself and found that my "Apollo" information was leaked in 2018. I don't know what "Apollo" is though and don't think they should have my details. Does anyone know details about that leak?
It's odd because an email I only use on Facebook is in the Apollo leak (and the Zynga one due to Words with Friends on Facebook.) So does Apollo have information from more than just Linkedin?
Yeah I'm not sure, they might have scraped other info too e.g. GitHub. Frankly I think it's kind of silly to categorize this as a "breach" when all the data was publicly available. And if you're trying to hide your email, you're fighting a losing battle already. Best to just assume your email is public knowledge.
In this leak every entry has a phone number and a name. But only a few have an email address connected.
So it would be nice to have a service where you put in a phone number and it will list what information is available for it. Like this:
[X] Phone Number
[X] Name
[X] Current location
[ ] Previous location
[ ] Current employer
[ ] Email address
[X] Birthday
[ ] Full date of birth (with year)
I have a question: I was wondering if my old fb details might be part of this (I haven't had fb for about 4 to 5 years), so I searched for my email on hibp. No results for the fb leak, but hibp says my email is in the "Lead Hunter" leak from March 2020, and that "The data was provided to HIBP by dehashed.com.".
So I did the same search on dehashed.com, and got no hits (the part before "@" gets hits, but I don't care about that).
If the data comes from dehashed.com, why don't I find my email there?
I'm interested in knowing what, if anything other than my email is in there, because if anything significant is there, maybe I can figure out where the leak originated.
I don't believe that's been leaked. Ubiquiti is still claiming that they have no evidence that any data was stolen. While that seems improbable, and it seems that's just because they didn't have proper logging, it would be harder to maintain if the data was available.
The cols (ive only loaded the US version) seem to be cell_phone | fb_id | first_name | last_name | gender | lives | from | releationship_status? | works_at | ?some year month maybe date sms was given | email | bday?
Not that strange at all. A lot of people all over the world have multiple email addresses. Myself for instance, I have an email address from my telecom provider, I have a Yahoo! email address from when I signed up years ago, my Google account has about two or three different ways of referring to the email address, plus I have my own domain; so I have about five different personal email addresses, plus my work email address.
Given that this may include data about EU residents and the GDPR has strict rules about data breaches and may even impose fines in cases of negligence I wonder if anything will come from this.
How is it legal for HIBP to keep copies of breach corpuses on their servers? For me personally, having corpuses on my hard-drive is like dealing with radioactive waste. There's so much PII to mull over that it feels naughty to sift through. There's even people on social media bragging about 'self doxing' their own info (just like with using HIBP), but using a local copy instead.
Yes but how is it legal to have multiple corpuses sitting on a gigantic server? Surely governments or even a LEA would want to regulate that? Having all that PII is like hoarding a bunch of radioactive waste.
They don't need to keep the data around longer than it takes to extract the affected email addresses, and that doesn't need to be done on a server at all.
>500m Facebook records were leaked and <10m records have an email address — far more records are phone number but no email.
[1] https://twitter.com/troyhunt/status/1378463581604220931