Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The perils of the “real” client IP (adam-p.ca)
194 points by zdw on March 5, 2022 | hide | past | favorite | 55 comments


I'd argue that this whole concept is inherently flawed. My home internet is behind a carrier grade NAT. Many services have rate limited the IP. It's to the point where I'm strongly considering paying a "streaming friendly" VPN to black hat subvert these reputation schemes.

The biggest offenders are internet of things backends for registered devices I "own" and streaming services I'm paying for.

Edit: Here's a concrete example: I looked up my CGNAT IP and saw that it was flagged as a malicious actor because someone ran a port scan from it a month ago.

Now that this offence has started to age out, a few services started working again. My entire ISP can be trivially DOS'ed with a raspberry pi and NMAP!

Of course, other services seem to just do per-IP rate limiting, so they run at << 1MB/s during peak hours. Fast.com and Speedtest.net claim the connection is healthy.


This is an interesting point. As more users get pigeonholed into sharing the same IPv4 addresses, IP-based rate limiting will essentially become useless. I wonder if this will help drive IPv6 adoption.

> The biggest offenders are internet of things backends for registered devices I "own" and streaming services I'm paying for.

Wait those are the services that are rate-limiting you? They're not smart enough to rate-limit based on your account credentials?


If IPv6 reputation is done at the /48 level but you have, say, a /56 then you're sharing reputation with up to 255 other customers.


Even worse, there are ISP who hand out /64 which you shouldn't even do.

I ran into this problem 15 years ago or so when I built my own ipv6 tunnel for home use using a hetzner server. Couldn't edit on Wikipedia, asked why, and apparently they blocked a whole /48 because of spam. Then they argued the whole provider should be blocked because it's a "tunnel provider" (it isn't) at which point I didn't even want to argue anymore and just stopped editing.


/56? What about /64 per customer. ISP just ignores RIPE rules and explanations why /64 is not enough for complex home network.


Still better situation than v4. Assigning is tend to be fixed and never shared.


> I wonder if this will help drive IPv6 adoption.

Only if users complain enough. Often they complain to the reverse proxy host[0] or website itself[1], when it would be a solved problem if IPv6 were properly deployed further.

0: top result on google for 'Cloudflare blocked me' with 38k views https://community.cloudflare.com/t/cloudflare-is-blocking-me...

1: https://www.coursera.support/s/question/0D51U00003BlYiVSAV/y...


> They're not smart enough to rate-limit based on your account credentials?

This works until a VPN node starts having tens of thousands of users from one IP address.


Rate limiting based on account credentials is a solution to having tons of users connecting from one IP address...


depends on what they are rate limiting. if they are trying to rate limit logins to prevent brute force attacks... I could see a naive implementation being problematic for NATed users.


Now you’ve put it that way, I hope someone is a black hat accelerationist who deliberately makes a worm to turn all IPv4 addresses into spam IPs. Force v6 adoption by making the internet unusable for the laggards.


>flagged as a malicious actor because someone ran a port scan from it a month ago

I can't believe how some companies assume a port scan is some sort of attack. I've actually gotten automated abuse emails back when port scanning some addresses. Madness.


There’s always going to be tension when it’s comes to “strong signals someone is acting in bad faith” and the exceptions.

I mean if you saw someone standing on your sidewalk taking note of all the unlocked doors and windows they see you would be right to be kinda suspicious. The over under on “they’re casing your house vs they’re a window lock enthusiast” isn’t great for the enthusiast. But unlike IRL you can just say, “ya know what I don’t really care to support any exceptions and so anyone who’s checking my windows is autobanned.”


> I'm strongly considering paying a "streaming friendly" VPN to black hat subvert these reputation schemes

Nothing "black hat" with that, at all.

...Unless you're referring to that those "streaming-friendly VPNs" are (acquiring residential IPs in sketchy ways from sketchy providers, such as compromised smart devices and other customers).


To be fair, this is a real problem that you sometimes can't ignore. There are plenty of situations where you need to filter out a bad actor by ip address.


> There are plenty of situations where you need to filter out a bad actor by ip address.

It isn't so much that you need to do it that way, but that there is no more practical way despite the inherent problems. Which has effectively the same end result, but thinking that way highlights the fact that CGNAT and other IPv4 limit “solutions” cause as many problems as they solve.


Absolutely, it is a good reason to avoid providers with CGNAT.


Unfortunately most users are not aware of the difference let along the implications, and those that are may not really have much of a choice of ISPs (maybe all the options available to them, or at least the affordable and otherwise practical options, use CGNAT). And in that circumstance there isn't going to be enough “voting with their feet” to make providers sit up and listen.


Do you not have the ability to lease a static IP from the ISP for a nominal amount? From my experience this is always an option even on consumer lines.


>Do you not have the ability to lease a static IP from the ISP for a nominal amount? From my experience this is always an option even on consumer lines.

Where I live (US), most ISPs won't give you static IPv4 addresses unless you are on a "business" plan. Beyond that, you also have to pay for those static IP addresses (it's ~US$25/month for 5 static IPv4 addresses, with one of those wasted because I'm forced to use the ISP's router, even though I already have one).

Perhaps where you are, static IPv4 addresses are available on "consumer" plans, but here not so much.


No. It's a local rural ISP. I don't think they're particularly competent.


It may be an option, but as a consumer, why would this be the optimal answer? All this does is hide costs for you.

This is a problem with service. If the service doesn't want to fix it, then that sounds like a potential market opportunity, no? I'm new to thinking about the business side, but it seems like I want to remove any obstacle I can preventing a customer from using my service.


it takes a little getting used to. the key here it thats alot easier to convince your customers that any flaws in your service are just 'the way it is' rather than even attempt to do anything about them.


> Do you not have the ability to lease a static IP from the ISP for a nominal amount?

and this is why IPv6 isn't coming anytime soon, at least not within the next few decades.


Nah, it is. It just won't be this one great event where everyone switches at the same time. My current ISP ships you a cheapo router that has it enabled by default. My previous one did to and even had cgnat. It just works.

You read comments here on HN every time ipv6 comes up how it caused great issues for them and they turned it off, but unless they are on a really shitty small ISP, it's their own fault because they don't use the router from the ISP but had to build their own setup and messed something up.

As people switch ISPs or get replacement routers that have it enabled, the world gradually moves over, and nobody apart from the nerds takes notice.


Assuming you raised this with the services, what responses did you get - if any?


This is a good point. I should have known better, but I just checked a setup I made a while back (nginx "proxy_set_header" [1] + Rust http::HeaderMap::get [2]), and I was doing it wrong. The nginx command appends, and get returns the first. Oops. At least I'm not doing IP-based auth, but I am looking for this to detect if the client is using TLS (up to the reverse proxy server, the backhaul is plaintext right now, like [3]) as well as logging the IP.

Another problem that I suspect is common: connections that sometimes go through a reverse proxy, sometimes not. In my setup, the "not" ones in theory are trusted (on the LAN or localhost) but still it doesn't feel right to not really know if the header came from the proxy or not. I could add a shared secret or something to the header, but given that a LAN-based attacker could sniff everything (not just this but also the actual user credentials/traffic) via ARP spoofing or something, it probably doesn't as much sense to bother before getting rid of the plaintext backhaul.

Speaking of which, it'd be nice to make the TLS end-to-end: from the user through a proxy that doesn't decrypt all the way to the application server. But I'm not sure what the state of the art there is. It used to be possible to dispatch based on the SNI traffic and then proxy at the TCP level, but I know TLS 1.3 added encrypted SNI. Not sure if the proxy can force clients fall back to non-encrypted SNI. I could imagine the spec authors making a point of not allowing this so a man-in-the-middle can't find the intended host. Maybe the two legs just have to be encrypted separately now.

[1] http://nginx.org/en/docs/http/ngx_http_proxy_module.html#pro...

[2] https://docs.rs/http/0.2.6/http/header/struct.HeaderMap.html...

[3] https://blog.encrypt.me/2013/11/05/ssl-added-and-removed-her...


> Speaking of which, it'd be nice to make the TLS end-to-end: from the user through a proxy that doesn't decrypt all the way to the application server. But I'm not sure what the state of the art there is. It used to be possible to dispatch based on the SNI traffic and then proxy at the TCP level, but I know TLS 1.3 added encrypted SNI. Not sure if the proxy can force clients fall back to non-encrypted SNI. I could imagine the spec authors making a point of not allowing this so a man-in-the-middle can't find the intended host. Maybe the two legs just have to be encrypted separately now.

My boringproxy[0] project has a mode for doing SNI routing at the reverse proxy, and tunneling it all the way back to a local client machine which handles the actual TLS termination. The client also gets certs automatically from Let's Encrypt. So you end up with automated end-to-end encryption where the boringproxy server/VPS can't decrypt any of the traffic.

Early versions of boringproxy acted as a more traditional reverse proxy, terminating the TLS at the server then making a new HTTP request upstream. Eventually I added the ability to move the HTTP proxy into the client to enable e2ee. Most recently I implemented raw TLS all the way. There are tradeoffs:

Pros:

* End-to-end encryption.

* Simplicity. You just need to peek at the SNI and look up what TCP tunnel to pipe into.

* Things like WebSockets and other hop-by-hop requests don't need to be implemented by the proxy.

Cons:

* You lose the ability to do compression/caching/CDN/etc at the server.

* You can't support new protocols like HTTP/2, HTTP/3, etc at your server because it only understands TCP wrapped in TLS.

In practice, I think these tradeoffs are totally worth it for self-hosting. I've found performance to be great for my purposes.

As for encrypted SNI (ESNI, now wrapped into encrypted client hello, ECH), I'm pretty sure it will be implemented in a tiered approach like you surmise. So in my case the boringproxy server will have the keys to decrypt the client hello, but the origin servers will still control the actual TLS decryption.

[0]: https://boringproxy.io


> I know TLS 1.3 added encrypted SNI. Not sure if the proxy can force clients fall back to non-encrypted SNI.

Encrypted SNI is not the default. You've got to do a fair bit of work to do it. You'd probably want your proxy to be able to decode it to direct the traffic anyway.


I have just checked and the nginx directive "proxy_set_header" does NOT append a value, it replaces. It will only be added to the existing value if you set it to `$proxy_add_x_forwarded_for` instead of `$remote_addr`.


Making decisions based on HTTP headers is always an opportunity to be surprised. A few years back we had implemented rudimentary access control on a server based on X-Forwarded-For. In front of that server, we had two chained instances of haproxy, one version 1.x, the other 2.x. Little did we know that haproxy 1 and 2 handled headers completely different wrt case-sensitivity. So we ended up not only not correctly removing untrusted headers, we also got two different headers with every request (mixed-case and lower-case) that our server handled differently (which was a bug).


That was one of the most comprehensive posts I read on this topic, well done.

One addition I'd suggest is to add going into the implications of enabling "client port preservation" with an AWS ALB. This can be a death sentence if you decide to turn it on without intimate knowledge on how all connected apps are trying to figure out the real IP. It appends the client port to the IP with a colon in X-Forwarded-For and if you happen to read IPs without expecting this then it can potentially cause IP values to appear invalid. It's super app dependent on what would happen but it has a lot of side effect potential.


Parsing this header is such a nightmare. HAProxy had a CVE a while ago where they stopped parsing the header if they hit a quote in the middle, which allowed you to forge the right-most IP.

As a result I had a long conversation with AWS where I told them it was ridiculous that ALBs allowed garbage to be inserted in the only header that contains this important information… suffice to say they did not care.


I didn't know about that option. Here's a link for anyone who wants details: https://docs.aws.amazon.com/elasticloadbalancing/latest/appl...

That's pretty bad. It makes the already perilous header even worse. I'll a note or addendum about it in the post when I get a chance.

(@kingforaday: You're welcome!)


I also would like to express my appreciation for the post and hope the Author will see this comment. Thanks!


For many years, a very prominent computer science journal used XFF for guarding access --- if you set it to an IP of some well-known universities, you'd be able to download all you want.

I feel comfortable about disclosing this now since we have things like SciHub (which may have used this trick at one point), and they fixed it a few years ago.


This is a good write up. I identified a similar issue almost a decade ago in a variety of flask apps https://esd.io/blog/flask-apps-heroku-real-ip-spoofing.html


FWIW, we have had an open PR in Caddy to improve client IP handling (opened a couple months ago). There's definitely some good points being made in this article, and I adjusted the PR accordingly.

https://github.com/caddyserver/caddy/pull/4507


I think one issue is that we expect to rely on a standard here, while only a proprietary solution, specific to the infrastucture seems to make sense.

Basically the first device in the infrastucture, whether it is a load balancer, a cache, etc. should set (or overwrite, if it's been spoofed) a custom proprietary header with the client IP. And that's what you can rely on - either it's set and you have the IP or its not and you can't really trust anything else.

He mentions Azure and a few others doing this and that sounds like the only correct approach.

(Although there's still an issue when the infrastructure is changed, and the application code is not updated - in that case the old header might no longer be set, and could now be spoofed)


I agree. However, you need to be _really_ careful about using the IP provided by your CDN, etc. For example, Akamai sets `True-Client-IP`... but doesn't overwrite it if it's present. By default, Fastly does the same with `Fastly-Client-IP`. (Yes, Azure is better, but make sure you pick the _right_ special header.)

Minefields within minefields.


The standard is not really the problem. The standard existing allows you to use off the shelf middleware and widespread libraries to solve your problem.

The problem here is that your border box should remove any such header that comes from an untrusted network, and not append to them.


When I worked in (quasi-)SRE at a somewhat busy and popular EU-based website, we "invented" our own HTTP header namespace (think like the "X-"-prefix that you see in the wild for "non-standard" HTTP headers, that often end up as ossified de-fact-standards without much of an RFC to specify them) for stuff that our internal infrastructure components, like the HTTP- and HTTPS-terminating reverse proxies, would add.

The systems involved took proper care that those headers and their values really came from us, and not some outside system. Of course, we also had some header akin to Our-Site-Prefix-Peer-IP-Addr, to know which TCP peer the outermost system was actually talking to. In combination with evaluating other headers, like the X-Forwarded-For that this article handles in delightful detail, there was a lot of interesting detective work to be had in terms of which clients tried to spoonfeed us which kind of would-be-spoofed data.

These days, Carrier-grade NAT and the slow death of the End-to-end principle make these techniques woefully inadequate to discern between "legitimate" use, and clients you'd rather keep out. Sadly.


:+1: for the effort to document this, and coordinating the disclosure with the vendors. This mainly talks about rate-limiting bypass/DoS, but if XFF is also used for audit trail logging of IP addresses and/or IP-based access lists, then the security implications can be even more severe, with falsified audit logs and bypassed security controls.

Setting up an application server behind a reverse proxy to use the "real" client IP is unfortunately very typically just a trial-and-error based process, with very little room for this kind of nuanced security-conciousness, because the configuration and exact behavior is all so non-standardized across different implementations of reverse-proxies and application servers... Typically users will just try different configuration settings until they find a combination that seems to work, and you would actually need to dig in with curl and tshark to understand the edge cases, because the documentation of the application-specific implementation is typically just one brief sentence...

Getting XFF working correctly through a complicated HTTP stack with multiple layers of nginx/haproxy/apache proxies (yes, they have different non-overlapping feature sets), custom backends implementing custom XFF handling/forwarding, and jetty/spring backends upgraded across a major version bump that changed the implementation and configuration properties related to XFF handling was insanely difficult. And of course it broke when migrating from a F5 LB to an AWS ALB, because it behaved differently for that one edge-case for an important customer... highly recommended to just override the entire XFF header with a single value at the appropriate point in your stack, if at all possible.

If just the naive leftmost-first vs rightmost-ish-with-configurable-list-of-trusted-upstream-proxies wasn't enough, then yeah, HAProxy does the thing where it adds a new 100% standards-compliant header continuation line [1] that maybe 1% of backend application developers have ever tested with. And trying to configure HAProxy to interpret the incoming XFF headers for logging/access-control ~is~/was even more weird [2].

[1] https://github.com/haproxy/haproxy/issues/44 [2] https://github.com/haproxy/haproxy/issues/90


> Setting up an application server behind a reverse proxy to use the "real" client IP is unfortunately very typically just a trial-and-error based process

This is very true, and I narrowly missed getting burned myself by thinking, "looks good to me!" after seeing my own IP.

However... I take a dig in the conclusions at the implementers of security-related libraries. I don't think it's okay for them to stop at "seems to work". They should be taking the time to fully understand the problem space.

> highly recommended to just override the entire XFF header with a single value at the appropriate point in your stack, if at all possible

I agree, and probably should have emphasized that approach. (And maybe will, and will definitely add a note at the end.) I didn't really give any/enough attention to configuring your first proxy with a custom single-IP header. (Partly because I was writing more for people who are trying to use what's available.)


Great write up. Every software engineer (including myself) who has received vague instructions like "rate limit the service based on client IP address" or "only allow requests from these IP ranges" has gone down this rabbit hole.

My TL;DR on the topic is:

1. There's no single correct mechanism for determining the client IP address. Any standards and headers you may have come across are merely hints. Your solution has to be custom-built taking into account your use case, your network topology and the service provider(s) you are reliant on. If any of the above changes, your code has to change as well.

2. There is no single "correct" answer for client IP address, no matter how much effort you put into 1. Any value you come up with is at best an approximation, so treat it as such, especially if you are using the value for something critical like authorization.


This is my favourite TL;DR so far.


Years ago, I found that a disposable email service Guerrilla Mail was adding IPs from x-forwarded-for to outgoing emails blindly, which could have been abused to make it look like any IP you wanted had sent an email

https://voidnet.tech/chaos/blog/do-not-trust-x-forwarded-for...


TL;DR don't trust the values of headers you don't control. If you're not sure whether you control them or not, read the rest of this article.


I think a part of the problem though is that "control" is not always well defined. I've seen headers used that were "controlled" because a proxy sat in front and did filtering. But then the app got moved behind a different proxy, and what they controlled was no longer such.

But, if you don't trust any headers at all, it's tough to do much with HTTP.


Right? "Don't depend on anything fakeable."


It is all ridiculously complicated. Envoy's documentation makes it pretty clear what concessions are made: https://www.envoyproxy.io/docs/envoy/latest/configuration/ht... I link the documentation not for the informative aspect, but just for you to gawk at the length and how worried the author is about it all ;)

I personally prefer that the load balancer speak "proxy protocol" to the front proxy, and then the front proxy simply make a decision as to what the external IP is and relay that to upstream applications with a proprietary header (x-envoy-external-address). Upstream applications that really want to be careful about the external address then verify the signature of envoy's TLS certificate when the connection comes in. (I use some mTLS for my homelab stuff, but don't have any software that cares to check this. I use Envoy's rate limiter rather than application specific rate limiting, and use Envoy's logs to get the IP address and the x-b3-trace-id header to correlate access logs with application logs.)

Proxy protocol is regrettably problematic from time to time, however. By default, Kubernetes tries to be "smart" about routing. If you have a LoadBalancer with internal IP 10.123.234.56 and external IP 54.43.32.21, traffic originating in the cluster with a destination address 54.43.32.21 does not transit the cloud provider's load balancer, it just gets rewritten to the internal IP address 10.123.234.56. This means that it doesn't use the proxy protocol, and the front proxy rejects the connection. This is generally not a problem, but will be annoying if you host an internal container registry in-cluster, because containers will be named for the external container registry and will thus resolve to the external IP address. The connection will skip your cloud provider's load balancer which adds the proxy header, and your front proxy will reject the connection, causing failing pulls. Most people will never hit this, because their cloud provider just provides a container registry (when I set up all my container stuff, my cloud provider of choice didn't offer this; they do now). But if you ever think internal applications will intend to use the public Internet to connect to other internal applications, watch out. This can come up if you do OIDC things internally, for example. (Apps will want to grab the .well-known configuration keys and JWKS key material from the external address that it will eventually redirect clients to. You will be confused when the configuration fails to load.)

In conclusion, what a mess.


That Envoy doc is... impressive. I'll find a place to link to it in the post.


Good find OP. That was a real breakdown there worth reading.


Short of using transparent IP proxies everywhere, or introducing digitally signed headers, there's no way out of this conundrem.


TL;DR: We use Cloudflare so I should replace my convoluted proxy whitelist with a copy of CF-Connecting-IP into the request's IP field.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: