Hacker Newsnew | past | comments | ask | show | jobs | submit | mschoebel's commentslogin

I'm in Germany and I have a Geiger counter running 24/7. I just looked at the data for January and February and the only thing that I notice is a VERY slightly higher reading on February 4th with 0.1727 microSievert/hour. Average for January was 0.1674, lowest was 0.1631, highest was 0.1703. So February 4th was less than 6% higher than the lowest value from January.

The difference was so small that I had just attributed it to normal fluctuations when I first saw it. Whatever caused this, so far it looks like it was a very small event.

I could probably provide a CSV-file with the raw data if anyone is interested. My Geiger counter stores a value every 5 minutes.


Could it be related to the accident in France? The date fits.

https://www.heise.de/tp/features/Flamanville-Unfall-im-Fiask...

http://www.fr-online.de/politik/unfall-explosion-in-akw-in-f...

Could you pleae provide the CSV file?


Tab-delimited CSV with the raw-data:

https://geekregator.com/files/HackerNews-RawData2017.zip

The timestamps are in UTC+2. I keep the clock in the Geiger counter on daylight savings time. Too lazy to change it. :)


Out of total curiosity what does your setup look like? Do you have a mode # of what you use? Never read anything about this before. Thanks.


Not sure about Op's setup, but an easy one to get into is uRad[1]. Along with logging it'll share it around the globe for finding any kind of event like this. Though I'm not sure that they've got any kind of statistical analysis going on.

http://www.uradmonitor.com/


How expensive are those sensors?


Their indiegogo put it at $90 for a kit to solder, and $120 assembled. No idea what the cost is now since you've got to email them on their website to get one it appears

https://www.indiegogo.com/projects/uradmonitor-environment-h...


I have the GammaScout Alert: https://www.gamma-scout.com/EN/Home.php

It stores the data in its internal memory which I then transfer via its USB port to my computer once a week.



Let's see if I got that right: Specification says I must load the AMP script from cdn.ampproject.org?

If yes, then there is no effing way that I will ever use this. I will NOT use something that forces me to load scripts from a host that I have no control over. Does nobody see what a HUGE security risk that is???


You got that right, and it's a fairly common critique of AMP.

The target group for AMP (traditional publishing sites) loads crap from all over the net in general and Google in particular anyways, so they don't care, but it leaves a very bad impression for an "open standard", yes.

Technically browsers could catch that include and replace it with local/cached logic, but I don't think that is happening or planned yet.


> Does nobody see what a HUGE security risk that is???

I dispute 'HUGE' (or even 'huge'). No more than using any CDN controlled by a large company.

1. You're welcome to say that CDNs are a risk in general but many reasonable people would disagree.

2. You're welcome to claim that Google is not to be trusted but it rather depends on your audience. If you're providing a platform for especially the especially sensitive (anything related to politics/human rights/government/medical/financial might warrant extra caution) then I'd agree but for the large majority of sites - loading javascript from Google is an acceptable trade-off.

So - I'm not disputing it's a security risk - I'm just not sure it's HUGE-in-capital-letters for most people.


Once AMP stabilizes, I'm hoping Google will encourage the use SRI to ensure that the content is what a site expects: https://developer.mozilla.org/en-US/docs/Web/Security/Subres...


Still unacceptable as it would still cause my users to expose their IP address to someone else's server.


Unfortunately the ubiquity of FB and G+ buttons, Google analytics and CDN use has raised a generation of web developers who don't see that as a problem.


Certificate Transparency will reveal the domains you connect to over TLS to a Google server anyway. Assuming you don't already use Google's DNS, that is.


Google could and should change their requirement to be that the integrity value for the script must be in their approved list rather than requiring their path.


Umm... FAQ says:

"What licence does Ultibo use? The GNU LGPL version 2.1, the same licence used by many other popular open source products."


I stand corrected. I thought it said GNU GPL.


FWIW... I just logged in to my Yahoo Account and removed the security questions. Just to be sure. I had already changed my password a few months ago when first rumors of this came up. I'm pretty sure that the option to remove the security questions wasn't there back then.


Pascal


Can you elaborate on this?


See: https://deusu.org

The backend is written in Pascal.

With the exception of the blog. That is written in Node.js. I did that as a way to learn Node.js. But as soon as I find some time I'll rewrite that in Pascal too.


I probably should also add that I'm using FreePascal - not Delphi - and that the servers are a mix of Windows and Linux.


Do you have a rough estimate of how many servers you will need for Elasticsearch for the 1.7bn URLs of the latest CommonCrawl?


The current average size of documents in the index is 2kB, so we'll need ~3.5TB of storage. For 1 replica this could mean 5 i2.xlarge instances if we go for SSDs on AWS.


Thanks for the answer.

I suggest not to use AWS if you know that you'll need a server 24/7. Old-school hosters which offer dedicated servers are much cheaper for that use-case.

There are several offers here in Europe where you can get an i7-6700, 64gb RAM and 1tb SSD for less than €60/month. AWS would cost you at least 3-4x as much. You'll lose the flexibility of AWS, but save a ton of cash.


>AWS would cost you at least 3-4x as much. You'll lose the flexibility of AWS, but save a ton of cash.

Isn't there more to the analysis than just comparing cpu before we can conclude it will save a lot of money?

It looks like their servers[1] use ~150TB source data that's already hosted on AWS disks. The source x.gz archives of the Common Crawl on AWS S3 are then imported to a Elasticsearch disks that are hosted on AWS.

To pull ~150TB of data using network speeds of 30 megabytes/sec[2] would take 60 days to transfer from AWS to another USA datacenter like Rackspace.

(Copying data from AWS to AWS isn't instantaneous either but it won't take ~60 days. At 60 days, the next crawl archive would have been released before you finished importing the previous one!)

Questions would be:

1) What are current 2016 network speeds between cloud providers?

2) What's the cost of ~150TB of network bandwidth?

3) From those datapoints, can we derive a rough rule-of-thumb where a certain amount of data exceeds the current capabilities (speed or economics) of the internet backbone available to projects like Common Search?

[1]https://about.commonsearch.org/developer/operations

[2]http://www.networkworld.com/article/2187021/cloud-computing/...


> 1) What are current 2016 network speeds between cloud providers?

I'm pretty sure if you need to ingest ~150 TB you can pull it from AWS/S3 much faster than you think. To absorb ~150TB you'd need ~75 nodes. Given you can download partials of Common Crawl, you can break it up to 75 nodes downloading in parallel with 1gbit/s ports you should be able to pull it down relatively quickly compared to your estimation.

I'd bet you could pull ~150 megabytes/s [16 mbit/s per node].

http://commoncrawl.org/the-data/get-started/

> The Common Crawl dataset lives on Amazon S3 as part of the Amazon Public Datasets program. From Public Data Sets, you can download the files entirely free using HTTP or S3.

https://www.hetzner.de/en/hosting/produkte_rootserver/ex41s

> 2) What's the cost of ~150TB of network bandwidth?

Free.

> There are no charges for overage. We will permanently restrict the connection speed if more than 30 TB/month are used (the basis for calculation is for outgoing traffic only. Incoming and internal traffic is not calculated). Optionally, the limit can be permanently cancelled by committing to pay € 1.39 per additional TB used. Please see here for information on how to proceed.

> 3) From those datapoints, can we derive a rough rule-of-thumb where a certain amount of data exceeds the current capabilities (speed or economics) of the internet backbone available to projects like Common Search?

I suspect you are greatly overestimating the difficulties since most DCs basically let you ingest/download for free because of the asymmetry on their networks.


One thing to consider is that we can build the index on AWS and then only do replication with other datacenters at the Elasticsearch level, which is ~50x smaller than the raw data.


Have you ever in real life hear someone say "If 80kg is less than your weight, then you are overweight."

Probably not. We say: "If your weight is more than 80kg then you are overweight."

When talking we are used to first mention "x", and after that the numerical value we compare it to. No matter if it's less than or larger than what we want to compare it to.

If you swap these around, then whoever listens to you - or reads your code - will take a few seconds longer to understand what you mean.


The argument is mostly about double comparisons, i.e., checking for belonging to an interval.


That webpage has a last-modified date from 2009...


Never call yourself a hacker. To non-tech people "hacker" has the meaning of "illegally hacks into computers".


True, but if he's looking for contracting work hopefully his CV will be in the hands of an engineer rather than a hiring manager/HR.


Engineers only very rarely make decisions about contractors. If you're trying to do this stuff, you need to talk to managers.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: