View Full Version : Request for help: regexp
Pekka
5th of October 2006 (Thu), 22:03
In EE 2 stylesheet.php there is some code to remove all style start and end tags plus comment codes before style is sent out. The code that now does it is very simple and does not tolerate any other than exact <style> tag used in default templates.
$stylesheet_html = str_replace("<style type=\"text/css\" media=\"screen\">","",$stylesheet_html); // very restricted. someone who knows regexp could make these idiot proof!
$stylesheet_html = str_replace("</style>","",$stylesheet_html);
$stylesheet_html = str_replace("<!--","",$stylesheet_html);
$stylesheet_html = str_replace("-->","",$stylesheet_html);
Does anyone know regexp to make one which removes all possible variations inside <style>? All help is appreciated! Thanks!
DavidW
5th of October 2006 (Thu), 23:22
Clearly the regex needs to be case insensitive for the tags (even though anything other than lower case breaks XHTML compliance), but you can get case insensitivity using eregi_replace().
Though, in some ways, Perl compatible regular expressions are more powerful, faster and binary safe, not everyone necessarily has PCRE turned on in PHP.
If you want to match any <style...> tag caselessly - which is, I think, the only latitude you need, then:
$stylesheet_html = eregi_replace("<style[^>]*>","",$stylesheet_html);
should do the job (I've checked this regex in a UNIX shell using grep, but not in PHP).
[^>] means every character except >. Put a * after it for 0 or more repeats. The > afterwards requires the match to finish with a >.
I don't see any need to change the other three lines, as you hinted in your post.
Does this do what you want?
David
DavidW
5th of October 2006 (Thu), 23:29
Thinking further, defensive coding suggests using a case insensitive match for the second line to make sure that </style> is stripped. str_ireplace() should do the job there.
We now have the following snippet:
$stylesheet_html = eregi_replace("<style[^>]*>","",$stylesheet_html);
$stylesheet_html = str_ireplace("</style>","",$stylesheet_html);
$stylesheet_html = str_replace("<!--","",$stylesheet_html);
$stylesheet_html = str_replace("-->","",$stylesheet_html);
(I've lost the indenting - throw it back in at will).
David
Pekka
6th of October 2006 (Fri), 18:03
Can't use str_ireplace as it is only in PHP 5 or later.
My RISC way of coding would suggest
$stylesheet_html = eregi_replace("<style[^>]*>","",$stylesheet_html);
$stylesheet_html = str_replace("</style>","",$stylesheet_html);
$stylesheet_html = str_replace("</STYLE>","",$stylesheet_html);
$stylesheet_html = str_replace("</Style>","",$stylesheet_html);
$stylesheet_html = str_replace("<!--","",$stylesheet_html);
$stylesheet_html = str_replace("-->","",$stylesheet_html);
but isn't there a way to make it with regexp (I don't know what is the case insensitive area switch is in regexp), just a guess :) :
$stylesheet_html = eregi_replace("</[s|S][t|T][y|Y][l|L][e|E]>","",$stylesheet_html);
DavidW
6th of October 2006 (Fri), 18:27
In enhanced regexes (as opposed to Perl regexes), it's the function used that determines case sensitivity. eregi_replace() is case insensitive, ereg_replace() is the regular, case sensitive function.
Just use $stylesheet_html = eregi_replace("</style>","",$stylesheet_html); for the second line - no need to mess around with anything else. The first line I gave above is already case insensitive because of the use of eregi_replace().
In other words, we have:
$stylesheet_html = eregi_replace("<style[^>]*>","",$stylesheet_html);
$stylesheet_html = eregi_replace("</style>","",$stylesheet_html);
$stylesheet_html = str_replace("<!--","",$stylesheet_html);
$stylesheet_html = str_replace("-->","",$stylesheet_html);
As a final teaching point, [S|s] is incorrect - it should be [Ss]. You're getting mixed up with the syntax for alternatives - (bar|foo). Don't do it anyway - it's messy - simply use a case insensitive regex.
David
Pekka
6th of October 2006 (Fri), 19:20
That looks good and works. Thanks!! :)
segal3
9th of October 2006 (Mon), 18:25
Should we be updating our code somewhere?
DavidW
15th of December 2006 (Fri), 08:15
This is included in 2.02 - it makes the code more robust.
David
vBulletin® v3.6.12, Copyright ©2000-2012, Jelsoft Enterprises Ltd.