More

mschoebel · on Feb 19, 2017

I'm in Germany and I have a Geiger counter running 24/7. I just looked at the data for January and February and the only thing that I notice is a VERY slightly higher reading on February 4th with 0.1727 microSievert/hour. Average for January was 0.1674, lowest was 0.1631, highest was 0.1703. So February 4th was less than 6% higher than the lowest value from January.

The difference was so small that I had just attributed it to normal fluctuations when I first saw it. Whatever caused this, so far it looks like it was a very small event.

I could probably provide a CSV-file with the raw data if anyone is interested. My Geiger counter stores a value every 5 minutes.

digimax · on Feb 19, 2017

Could it be related to the accident in France? The date fits.

https://www.heise.de/tp/features/Flamanville-Unfall-im-Fiask...

http://www.fr-online.de/politik/unfall-explosion-in-akw-in-f...

Could you pleae provide the CSV file?

mschoebel · on Feb 19, 2017

Tab-delimited CSV with the raw-data:

https://geekregator.com/files/HackerNews-RawData2017.zip

The timestamps are in UTC+2. I keep the clock in the Geiger counter on daylight savings time. Too lazy to change it. :)

WestCoastJustin · on Feb 19, 2017

Out of total curiosity what does your setup look like? Do you have a mode # of what you use? Never read anything about this before. Thanks.

simcop2387 · on Feb 19, 2017

Not sure about Op's setup, but an easy one to get into is uRad[1]. Along with logging it'll share it around the globe for finding any kind of event like this. Though I'm not sure that they've got any kind of statistical analysis going on.

http://www.uradmonitor.com/

cosmolev · on Feb 19, 2017

How expensive are those sensors?

simcop2387 · on Feb 20, 2017

Their indiegogo put it at $90 for a kit to solder, and $120 assembled. No idea what the cost is now since you've got to email them on their website to get one it appears

https://www.indiegogo.com/projects/uradmonitor-environment-h...

mschoebel · on Feb 19, 2017

I have the GammaScout Alert: https://www.gamma-scout.com/EN/Home.php

It stores the data in its internal memory which I then transfer via its USB port to my computer once a week.

mschoebel · on Oct 15, 2016

Have you looked at https://uimovement.com/tag/search/ and https://uimovement.com/tag/filter/

mschoebel · on Oct 14, 2016

Let's see if I got that right: Specification says I must load the AMP script from cdn.ampproject.org?

If yes, then there is no effing way that I will ever use this. I will NOT use something that forces me to load scripts from a host that I have no control over. Does nobody see what a HUGE security risk that is???

detaro · on Oct 14, 2016

You got that right, and it's a fairly common critique of AMP.

The target group for AMP (traditional publishing sites) loads crap from all over the net in general and Google in particular anyways, so they don't care, but it leaves a very bad impression for an "open standard", yes.

Technically browsers could catch that include and replace it with local/cached logic, but I don't think that is happening or planned yet.

andybak · on Oct 14, 2016

> Does nobody see what a HUGE security risk that is???

I dispute 'HUGE' (or even 'huge'). No more than using any CDN controlled by a large company.

1. You're welcome to say that CDNs are a risk in general but many reasonable people would disagree.

2. You're welcome to claim that Google is not to be trusted but it rather depends on your audience. If you're providing a platform for especially the especially sensitive (anything related to politics/human rights/government/medical/financial might warrant extra caution) then I'd agree but for the large majority of sites - loading javascript from Google is an acceptable trade-off.

So - I'm not disputing it's a security risk - I'm just not sure it's HUGE-in-capital-letters for most people.

callahad · on Oct 14, 2016

Once AMP stabilizes, I'm hoping Google will encourage the use SRI to ensure that the content is what a site expects: https://developer.mozilla.org/en-US/docs/Web/Security/Subres...

mschoebel · on Oct 14, 2016

Still unacceptable as it would still cause my users to expose their IP address to someone else's server.

retox · on Oct 14, 2016

Unfortunately the ubiquity of FB and G+ buttons, Google analytics and CDN use has raised a generation of web developers who don't see that as a problem.

kbwt · on Oct 14, 2016

Certificate Transparency will reveal the domains you connect to over TLS to a Google server anyway. Assuming you don't already use Google's DNS, that is.

bryanlarsen · on Oct 14, 2016

Google could and should change their requirement to be that the integrity value for the script must be in their approved list rather than requiring their path.

mschoebel · on Oct 12, 2016

Umm... FAQ says:

"What licence does Ultibo use? The GNU LGPL version 2.1, the same licence used by many other popular open source products."

nanolith · on Oct 12, 2016

I stand corrected. I thought it said GNU GPL.

mschoebel · on Sept 22, 2016

FWIW... I just logged in to my Yahoo Account and removed the security questions. Just to be sure. I had already changed my password a few months ago when first rumors of this came up. I'm pretty sure that the option to remove the security questions wasn't there back then.

mschoebel · on June 28, 2016

Pascal

dejv · on June 28, 2016

Can you elaborate on this?

mschoebel · on June 28, 2016

See: https://deusu.org

The backend is written in Pascal.

With the exception of the blog. That is written in Node.js. I did that as a way to learn Node.js. But as soon as I find some time I'll rewrite that in Pascal too.

mschoebel · on June 28, 2016

I probably should also add that I'm using FreePascal - not Delphi - and that the servers are a mix of Windows and Linux.

mschoebel · on March 14, 2016

Do you have a rough estimate of how many servers you will need for Elasticsearch for the 1.7bn URLs of the latest CommonCrawl?

sylvinus · on March 14, 2016

The current average size of documents in the index is 2kB, so we'll need ~3.5TB of storage. For 1 replica this could mean 5 i2.xlarge instances if we go for SSDs on AWS.

mschoebel · on March 14, 2016

Thanks for the answer.

I suggest not to use AWS if you know that you'll need a server 24/7. Old-school hosters which offer dedicated servers are much cheaper for that use-case.

There are several offers here in Europe where you can get an i7-6700, 64gb RAM and 1tb SSD for less than €60/month. AWS would cost you at least 3-4x as much. You'll lose the flexibility of AWS, but save a ton of cash.

jasode · on March 14, 2016

>AWS would cost you at least 3-4x as much. You'll lose the flexibility of AWS, but save a ton of cash.

Isn't there more to the analysis than just comparing cpu before we can conclude it will save a lot of money?

It looks like their servers[1] use ~150TB source data that's already hosted on AWS disks. The source x.gz archives of the Common Crawl on AWS S3 are then imported to a Elasticsearch disks that are hosted on AWS.

To pull ~150TB of data using network speeds of 30 megabytes/sec[2] would take 60 days to transfer from AWS to another USA datacenter like Rackspace.

(Copying data from AWS to AWS isn't instantaneous either but it won't take ~60 days. At 60 days, the next crawl archive would have been released before you finished importing the previous one!)

Questions would be:

1) What are current 2016 network speeds between cloud providers?

2) What's the cost of ~150TB of network bandwidth?

3) From those datapoints, can we derive a rough rule-of-thumb where a certain amount of data exceeds the current capabilities (speed or economics) of the internet backbone available to projects like Common Search?

[1]https://about.commonsearch.org/developer/operations

[2]http://www.networkworld.com/article/2187021/cloud-computing/...

fweespee_ch · on March 14, 2016

> 1) What are current 2016 network speeds between cloud providers?

I'm pretty sure if you need to ingest ~150 TB you can pull it from AWS/S3 much faster than you think. To absorb ~150TB you'd need ~75 nodes. Given you can download partials of Common Crawl, you can break it up to 75 nodes downloading in parallel with 1gbit/s ports you should be able to pull it down relatively quickly compared to your estimation.

I'd bet you could pull ~150 megabytes/s [16 mbit/s per node].

http://commoncrawl.org/the-data/get-started/

> The Common Crawl dataset lives on Amazon S3 as part of the Amazon Public Datasets program. From Public Data Sets, you can download the files entirely free using HTTP or S3.

https://www.hetzner.de/en/hosting/produkte_rootserver/ex41s

> 2) What's the cost of ~150TB of network bandwidth?

Free.

> There are no charges for overage. We will permanently restrict the connection speed if more than 30 TB/month are used (the basis for calculation is for outgoing traffic only. Incoming and internal traffic is not calculated). Optionally, the limit can be permanently cancelled by committing to pay € 1.39 per additional TB used. Please see here for information on how to proceed.

> 3) From those datapoints, can we derive a rough rule-of-thumb where a certain amount of data exceeds the current capabilities (speed or economics) of the internet backbone available to projects like Common Search?

I suspect you are greatly overestimating the difficulties since most DCs basically let you ingest/download for free because of the asymmetry on their networks.

sylvinus · on March 14, 2016

One thing to consider is that we can build the index on AWS and then only do replication with other datacenters at the Elasticsearch level, which is ~50x smaller than the raw data.

mschoebel · on Feb 22, 2016

Have you ever in real life hear someone say "If 80kg is less than your weight, then you are overweight."

Probably not. We say: "If your weight is more than 80kg then you are overweight."

When talking we are used to first mention "x", and after that the numerical value we compare it to. No matter if it's less than or larger than what we want to compare it to.

If you swap these around, then whoever listens to you - or reads your code - will take a few seconds longer to understand what you mean.

giomasce · on Feb 22, 2016

The argument is mostly about double comparisons, i.e., checking for belonging to an interval.

mschoebel · on Sept 3, 2015

That webpage has a last-modified date from 2009...

mschoebel · on May 19, 2015

Never call yourself a hacker. To non-tech people "hacker" has the meaning of "illegally hacks into computers".

EStudley · on May 19, 2015

True, but if he's looking for contracting work hopefully his CV will be in the hands of an engineer rather than a hiring manager/HR.

eropple · on May 19, 2015

Engineers only very rarely make decisions about contractors. If you're trying to do this stuff, you need to talk to managers.