Zie voor wat PHP onveiligheid naar de volgende code
Dit in reactie op sommige posters hier....
usage : function html_filter($content, &$result, $mode, &$Warning,$AllowedTags='',&$MissingTags)
Where : - $content is the original string content.
- $result is the filtered output string.
- $mode is a bitmask that define the behaviour of the function.
- $Warning is a returned array of warning messages describing every change done to $content.
- $AllowedTags is a list of tags that are allowed to be present in content with their eventual allowed attributes.
- $MissingTags is a returned string that will contain any closing tag that is missing from $content.
$mode must be defined as (MODE_STRIP|MODE_REPLACE) | (MODE_ALL|MODE_PARTIAL)
(MODE_STRIP|MODE_REPLACE) : define the action to be taken when filtering. Either strip or replace
(MODE_ALL|MODE_PARTIAL) : define which tags are allowed.
MODE_ALL mean filter all tags.
MODE_PARTIAL allows tags in $AllowedArray to be present.
The return value will be either RC_OK, no tags needed to be filtered or RC_MODIFIED, something needed to be filtered/added.
for a more detailled use of html_filter, have a look at mbs.php3 a messageBoard System that relies on this function for the
filtering of the posts.
History :
25 March 2001 :
- Added the $tabletags and some code in html_filter to filter tags that are only allowed insidehtml tables if they
appear outside table (Maarten).
1 november 2002 :
- bug correction : Warning: Undefined variable: xxx in /homepages/37/d23823115/htdocs/http/mbs/filter.inc on line 266
Contributors :
- Maarten
*/
define('MODE_STRIP', 0x01);
define('MODE_REPLACE', 0x00);
define('MODE_ALL', 0x02);
define('MODE_PARTIAL',0x00);
define('RC_OK', 0x00);
define('RC_MODIFIED', 0x01);
define('RC_MALFORMED', 0x02);
// This array *should* contain every possible tags. It is only used in the first pass of the filtering to differentiate
// '<' as a tag opener or as anything else.
$tags=array('b','u','i','pre','br','p','center','s','small','strike','sub','sup','tt','table','tr','td','th','font','hr','a','ul','ol','li','img','blockquote','code','em','strong','applet', 'kbd');
// Tags that are only allowed inside a table. They will be filtered if they
// appear outside a table. All elements must be uppercase.
$tabletags = array('TR', 'TD', 'TH');
function show_allowedtag($AllowedTags)
{
echo("<ul>\n");
reset($AllowedTags);
while (list($Tag, $Attribute)=each($AllowedTags))
{
echo("<li><$Tag>");
if (!empty($Attribute)) echo(" with ($Attribute) attribute allowed");
}
echo("</ul>\n");
}
function html_filter($content, &$result, $mode, &$Warning,$AllowedTags='',&$MissingTags)
{
$MissingTags='';
global $tags;
global $tabletags;
//------new tags management routine--------
//We will replace ever occurance of < and > that doesn't enclose a tag (listed in $tags) by their html equivalent.
//$tags should be as full as possible. We are just trying to distinguish <tag> from other usages of < and >
//(normal 'less/greater than' meaning)
$newcontent='';
// '!' will be used as our preg expression delimiter, so we better _strip_ them all out.
$content = ereg_replace('!','!',$content);
//Now we loop through the content to locate every <****>
//We should get something like '$reg[1]<$reg[2]>' Note the 's' :-)
//Where reg[1] is the part before the tag and reg[2] the *tag* itself
while(preg_match('!^(.*?)<([^><]*?)>!s',$content,$reg))
{
$chunk = preg_replace('!<!','<',$reg[1]); //let's replace all < and > in $reg[1] as they aren't tags delimiter
$chunk = preg_replace('!>!','>',$chunk);
$newcontent.=$chunk;
//Remove reg[0] from content (so the loop will hopefully end soon or later :-)
$content = preg_replace('!^'.preg_quote($reg[0]).'!','',$content);
//Now let's work on $reg[2] (the 'maybe' tag);
$chunk=$reg[2];
//if we can isolate a tag inside the < >, then we will check it.
if (preg_match('!\s*\/?([^\s]*)!',$chunk,$reg)) {
//If the *tag* found is really a tag, don't touch the < > otherwise replace them
if (in_array(strtolower($reg[1]),$tags)) $newcontent.='<'.$chunk.'>'; else $newcontent.='<'.$chunk.'>';
} else $newcontent .= '<'.$chunk.'>'; // nothing could be located inside the < > so we replace them.
}
//something could be left in $content (the ending part).
$chunk = preg_replace('!<!','<',$content); //Simply replace < and >
$chunk = preg_replace('!>!','>',$chunk);
$newcontent.=$chunk;
//copy 'filtered' content back
$content = $newcontent;
// -----Old malformation checking routine---------
// This one prevents any <> other than tags one to be present in the content ... so , not very usefull :-(
/*
//Let's first check if the tag is not malformed (maybe intentionnaly :-)
//remove all chars except < and >
$chunk = ereg_replace('[^<>]','',$content);
//now we should stay with only <> pairs, let's remove them
$chunk = ereg_replace('<>', '',$chunk);
//at this point we should stay with an empty line. Check it
if ($chunk!='') return RC_MALFORMED; //Line is NOT empty. We got a problem !!
*/
//------ This is the main filter section ------
//It will evaluate every tags in content and will either replace, remove or leave them depending on $mode and $AllowedTags list
//If any tags is replaced or removed, the action will be 'logged' in $Warnings.
$modified = FALSE; //did we modified anything ?
if ($AllowedTags=='') $mode|=MODE_ALL; //No tag list supplied, so we will work on ALL tags.
else if (count($AllowedTags)==0) $mode|=MODE_ALL; //if tag list is empty then we have to process ALL tags
$Filtered = ''; //This is where we will store the filter's result
//The following split is a smart trick i found at ....
//It will properly seperate tag text and normal text.
//Normal text will be in even numbered array element while tag text will be in odd ones
//and that in every possible cases :-)
$line = split('<[[:space:]]*|[[:space:]]*>',$content);
//Inside a PRE tags, we won't replace /n by <BR> to prevent blank lines
$InPre = FALSE; //at the beginning we can't be inside a <pre> tag
$InTable = 0; // Keeps track of the nesting of TABLEs in order to filter
// out any <tr> or <td> that are not inside a table. This variable is
// increased at every <table> and decreased at every </table>. At the
// beginning, we're not inside a table.
for ($i=0; $i<count($line); $i++) //Let's loop for every chunk of text
{ //Odd chunks = TAGS / even chunks = TEXT
if ($i%2) { //We have to process a tag text.
//let's first check if this is a <pre> tag.
if (preg_match("!^pre(?:\\s+.*|)\$!i",$line[$i])) $InPre = TRUE;
//or a </pre>
if (preg_match("!^/pre(?:\\s+.*|)\$!i",$line[$i])) $InPre = FALSE;
$TagOK=FALSE; //By default the TAG is not allowed
$OffendingAttr=''; //Just to tell people what was wrong withing this tag
if (!($mode&MODE_ALL)) { //If we are running in partial mode, we must check the tag against the allowed ones
reset($AllowedTags);
while (list($tag,$attribute_list)=each($AllowedTags)) { //loop for each allowed tag
if (preg_match("!^(/?)$tag(?:\\s+(.*)|)\$!i",$line[$i],$reg)) { //additional check need to be done on the tag attribute
$TagOK=TRUE;
if (empty($reg[2])) break; //if no attribute on this tag, then it is ok.
//If we got any attribute along this tag, let's process it.
$stuff = preg_replace('!".*?"!','""',$reg[2]); //replace all "xxxxx" with only "" (greedy preg_replace)
$stuff = preg_replace('!\s+!',' ',$stuff); //replace all multispace by 1 single space (simple ereg_replace)
$stuff = preg_replace('!\s?=\s?!','=',$stuff); //replace all ' = ' with '=' (simple ereg_replace)
//we should now stay with something like 'attr=value attr="" attr=value attr'
$stuff = preg_replace('!=[^\s]*!','',$stuff); //new let's remove those "=something" part
//now we should split on ' ' to get the array of attributes we have to check
$tag_attr_array = split(' ',$stuff);
//Before going any further, let's check if current Tag have any attribute allowed
if (empty($attribute_list)) {
$TagOK=FALSE; //No, so this tag is rejected
$OffendingAttr = $stuff; //just show all attributes rejected
break; //no attribute allowed for this tag !!
}
$allowed_attr_array = explode(',',$attribute_list);
reset($tag_attr_array); //let's check if all attribute are allowed for current tag
while(list(,$attr)=each($tag_attr_array)) {
$AttrOK=FALSE; //By default attribute is not allowed
reset($allowed_attr_array);
while(list(,$allowed_attr) = each($allowed_attr_array)) {
if (eregi($allowed_attr, $attr)) { //Tag attribute is found in allowed list, continue
$AttrOK=TRUE;
break;
}
}
if (!$AttrOK) {
$TagOK=FALSE;
$OffendingAttr.=$attr." ";
}
}
break; // no need to keep looping through allowed tags
}
} //End loop for each tag
}
//The code below is from Maarten.
// Keep track of TABLEs (to be able to filter any <tr> or <td> tags that
// are outside of tables).
if ($TagOK && (strtoupper($tag) == 'TABLE')) {
// Only count table tags that will not be filtered.
if (isset($reg[1]) && ($reg[1] == '/')) {
// This is a closing table tag.
if ($InTable>0) $InTable--;
} else {
// This is an opening table tag.
$InTable++;
}
}
if ($TagOK) {
if (!$InTable) {
// This tag appears outside any tables, so <tr> or <td> are not
// allowed.
$touppertag = strtoupper($tag);
// Check the list of all tags that are not allowed outside any tables.
if (in_array($touppertag, $tabletags)) {
// It's a <tr> or <td> (or the closing versions of any of these
// tags, and they are not allowed here, outside tables.
$TagOK=FALSE;
break;
}
}
}
if ($TagOK && (strtoupper($tag)==$tag)) //We got a accepted tag. Now let's check if this tag is one of those that NEED to be closed soon or later
{
if (isset($reg[1]) && ($reg[1]=='/')) { //This is a closing tag.
if (!isset($list[$tag]) || ($list[$tag]==0)) $TagOK=FALSE; //We can't have a closing tag before the opening one !!
else $list[$tag]--;
} else { //This is an opening tag. Increment tag counter
if (!isset($list[$tag])) $list[$tag]=0;//Bug correction 11/2002 :Warning: Undefined xxxe: list in /homepages/37/d23823115/htdocs/http/mbs/filter.inc on line 266
$list[$tag]++;
}
}
if ($TagOK) $Filtered.='<'.$line[$i].'>'; //Tag is allowed : append it !!
else {
$modified = TRUE; //tag is not allowed : filter it !!
if (!($mode&MODE_STRIP)) $Filtered.='<'.$line[$i].'>'; //That is : either strip or replace
if (empty($OffendingAttr)) {
$Warning[]="tag $line[$i] isn't allowed. It will be ".($mode&MODE_STRIP?'removed':'replaced');
}
else {
$Warning[]="tag $line[$i] isn't allowed with attribute $OffendingAttr. It will be ".
($mode&MODE_STRIP?'removed':'replaced'); }
//$Warning[$line[$i]]=(!empty($OffendingAttr)?$OffendingAttr:''); //Tag have been modified : warning
}
} //end if(i%2)
//This is non-tag text.. Just append it
else $Filtered .= $line[$i]; //($InPre?$line[$i]:nl2br($line[$i]));
} //end FOR loop
$result = $Filtered;
if (isset($list))
{
reset($list);
while (list($tag,$val)=each($list))
{
while ($val>0) {$MissingTags.="</$tag>"; $Warning[]='missing closing </'.$tag.'> tag';$val--;}
}
$modified=TRUE; //we added some closing tags, so something was modified
}
if ($modified) return RC_MODIFIED; else return RC_OK;
}
bron - code auteur 07/2000 by Laurent. Inclusief distributie voorwaarde:
Distribution Policy:
This include file is originally written 07/2000 by Laurent.
It was originally written as a contribution to Fravia's searchlores
phplab (http://www.2113.ch/phplab)
He can be reached at laurent30AThotmail.com or phplabAT2113.ch
Please send constructive critics, ideas or ameliorations, thanks.
This include file is freely distributable, as long as you keep
this unmodified Distribution Policy text somewhere visible within
the sources.
Ik wilde aantonen hoe men PHP wel degelijk aan de tand kan en moet voelen met de PHP Code Security Scan bijvoorbeeld.
Dit in de lijn van de digitale nalatenscahp mijn online searchlore master van destijds - F.R.A.V.I.A (RIP).
Ook Sucuri heeft een nuttige scanner online gezet. Wie code niet controleert is geen code waard!