??? Is that a joke? Don't do that! That's just as bad as magic quotes.
> no register globals
It hasn't existed for a quite a while now in PHP.
Other than that I quite agree with what you wrote.
Most of the criticism of PHP is simply a desire to be special. If lots of people are using something, you want to make sure not to, so that you feel special. Now that you've decided not to use it, you need to justify it to yourself. And people find all sorts of real criticisms in the process - yet none of them matter!
> > default to HTML escaping on output
> ??? Is that a joke? Don't do that! That's just as bad as magic quotes.
Microsoft's Razor and Python's Django templates have both shown that HTML-escaping-by-default can be cleanly done in a way that is not magical and that is highly reliable. I'm not honestly sure what you're protesting here.
In a way, "default escaping" isn't even the way to think of it. There is everywhere functions to emit raw characters and to emit them HTML-encoded. "Default" escaping just means that the function that emits HTML-encoded characters is easy to get at and smoother to use than the raw one.
I understand the debates Django had about changing what the unspecified output function did, but nowadays if you're building a new template language of any kind it's an absolute no-brainer: Make it easier to encode than to bypass encoding. The alternative is just awful.
If I make a string: '<P>' . $var . '</P>' - does it know to encode the variable, and not the entire string?
What if I assign that string in a variable, and then output it later?
I'm not convinced this can be done well. Maybe if all you do is make some templates and fill them in you could do it. But I do a lot more than that, I output dynamically built html all the time.
Just give people a very easy and shortly named function for escaping.
Just needs a bit of type system magic. Don't use the same type for escaped and unespaced strings. (And don't use the same type for user generated input before and after it's scrubbed / escaped of any nastiness.)
Ask any Haskell weeny for details.
Also in your example, you'd probably be better of, if your language knew about the HTML structure, e.g. something like P($var), instead of putting the tags in as strings.
I like this, although when you concatenate strings, does it actually concatenate them (and loose the type info)? Does it escape them, and then concatenate? Or does concatenation actually make a closure, which is only executed upon output? (And does doing that make things memory heavy.)
Haskell is on my next language to learn list.
> something like P($var)
Ugh. I hate that. I've spent years learning HTML, I want to write HTML, not some other language that looks like it.
Yes, judging by your questions Haskell is a good language to learn for you. To give you a sneak preview: You can use the type system in such a way, that your source program will safely discriminate between the two types of strings (e.g. tainted and untainted), but there won't be anything left in the compiled assembly (tags or wrapping or whatever).
Also about the HTML: You can of course use different syntax for what I proposed, one that's closer to actual HTML. But I think as long as the syntax tree is preserved, it's still close enough to HTML for me, and all your accumulated knowledge about HTML is still applicable. (Not to be derisive, but if your years of learning are no longer of any use upon such a cosmetic change, you should probably examine your level of comprehension.)
$foo = <p>$textvar</p>;
$foo->append(<span class='will-end-up-before-close-of-p'>Hello!</span>); // etc.
Mozilla proposed to add XML literals to JavaScript at one point, which didn't take off for security reasons, but server-side it's a different ballgame... maybe it could be worked out? Hmm.
Wow, that's even uglier. But you did not only propose an alternative syntax for HTML, but also for its manipulation. So that's a good enough excuse.
Have you looked at how Racket (Scheme, Lisp) deals with encoding HTML in S-expressions? I find that rather nice, and even prefer it to plain HTML or XML. Racket is a fine language for manipulating S-expressions, too.
Even with static typing, you might end up implementing using run-time support. (Of course the holy grail is to compile away all type information. But that's not only attainable. Even Haskell's ghc compiler keeps some information around for runtime. Something to do with typeclasses, if you want to look up the details.)
> Don't use the same type for escaped and unespaced strings.
And if you can't extend your type system to make this work, do it in your head, mutating the names of variables to help you keep it straight. For example, esStr and unStr are not of the same type, and moving data from one to the other without conversion is always an error.
This reminds me of Charles Simonyi's classic article on Hungarian Notation. I know that style gets criticized a lot, but that's usually when it has been used inappropriately. If you have a language with a weak type system then a sensible variable prefix convention can help a lot.
I'd say don't do that, refactor into templates instead. With your preferred approach to generating markup, you're largely on your own in protecting against XSS. Hopefully all the developers using your code are awesome at spotting and pro-actively dealing with XSS issues.
Still, with a HTML escape everything default, either turn it off, or use a raw "I know what I'm doing" method instead.
In Razor at least you'd create HtmlHelpers to output html, which returns a string that the system knows is html and you've already dealt with. It knows not to escape the string as you've explicitly said the string you're creating is markup.
You'd do something like @html.SalesWidget("90% off today") with the SalesWidget being responsible for escaping the string.
Also I don't really get your argument, why not give people a really short and easy way of not escaping a string instead? The opposite of what you're suggesting is just as easy and far safer. You're less liable to accidentally muck up.
The hopefully obvious answer to your question is to not generate markup in PHP (or any server-side language). These are the kinds of questions that Backbone, Spine, Ember, etc attempts to solve. You should look toward separating view concerns from your business logic and stop procedurally generating html in PHP.
Perhaps what we need is a language/platform that has built in strings that track not just the code page type encoding, but some kind of "intent assertion" as well -- is the string intended to be encoded for a particular output? Combining an "unknown" string with an HTML (or SQL, or PostScript, or JSON/JavaScript, ...) string would produce an exception.
Such a mechanism would have to include encoding functions (and assertion override functions), of course.
It seems this would help alleviate many types of fill-in-the-blank injection problems as well.
This problem has already been solved in the Haskell ecosystem [1]. For example, you get typesafe URLs so that if you have a standard query like myapp.com/person/345 you can't mistakenly misuse 345 as an article id. Every input string is tracked by the type system so the possibility for escape issues, injection attacks or cross site scripting exploits to sneak in is minimal. Static types also make sure that internal links can not be broken - if for example you decide to change the above URL to myapp.com/getperson instead, your application won't compile until you've fixed every other part that still references the old link .../person, and so on.
Not to mention the (also type safe) dead easy to use persistence framework.
I'm still in the process of evaluating different solutions for my next web project but so far I'm pretty sure this is gonna be my go-to framework in the future.
> ??? Is that a joke? Don't do that! That's just as bad as magic quotes.
No, it's not. Magic quotes are a pain to reverse. HTML escaping can be built into the echo/print/<?= .. ?> commands, so that it is:
- A setting that can be totally turned off at run-time, if you want;
- Easily overridden on a case-by-case basis with a 'rawecho' function that
does *not* escape anything.
Problem with HTML escaping by default - it's not all HTML escaping. Javascript strings need to be JS escaped, sometimes escaped by HTML as a second (or even first) step to complete the correct encoding needed to avoid XSS for the specific context(s) that output actually ends up in for a browser. Same for CSS, URIs, vbscript, parameters, etc.
HTML escaping is not the one and only escaping strategy that magically makes everything safe. So any automated system would need to incorporate overrides on a per variable basis.
"And people find all sorts of real criticisms in the process - yet none of them matter!"
I'd like to mention something similar that I found in human behavior with respect to manufacturing jobs that we used to do for customers at a company that I was at. What we found was that people had 1 or 3 major complaints about something that were pretty valid. But what they did then was feel that they needed to build a strong case so they ended up coming up with as many nits as they could to build their case as if that somehow made the true concerns stronger. Just my experience but I have found that if you have a true concern about something to simply focus on trying to get that fixed. If you throw everything in the other party sometimes feels that they will never be able to please you and then they don't even try to fix anything. YEMV.
??? Is that a joke? Don't do that! That's just as bad as magic quotes.
> no register globals
It hasn't existed for a quite a while now in PHP.
Other than that I quite agree with what you wrote.
Most of the criticism of PHP is simply a desire to be special. If lots of people are using something, you want to make sure not to, so that you feel special. Now that you've decided not to use it, you need to justify it to yourself. And people find all sorts of real criticisms in the process - yet none of them matter!