iEntry 10th Anniversary RSS Newsletter Advertising
Visit Twellow.com
Text: Decrease Font Size Increase Font Size | Print Print Article | Share: Delicious Digg StumbleUpon Post to Twitter Post to Facebook
Friday, December 1, 2006

ASP.NET: Remove HTML Tags From a String

A lot of websites allow users to input text and submit it to the site.

This could be forums, blogs, content management systems etc.

Imaging if the user writes HTML into these form fields?

It could be perfectly harmless when used for styling, but it could also be used the wrong way.

A typical scenario would be when a user enters JavaScript that does harmful things or embedding a style sheet that ruins the websites layout.

This is normally referred to as Cross-Site Scripting (XSS).

We have to mitigate that risk, and that's when regular expression comes to the rescue.

Here is a very simple method that strips all HTML tags from a string or just the harmful tags - you decide.

The method takes two parameters: the string that needs tag removal and a boolean flag that determines if harmless tags are allowed or not.

public static string StripHtml(string html, bool allowHarmlessTags)
{
   if (html == null || html == string.Empty)
     return string.Empty;

   if (allowHarmlessTags)
     return System.Text.RegularExpressions.Regex.Replace(html, "", string.Empty);

   return System.Text.RegularExpressions.Regex.Replace(html, "<[^>]*>", string.Empty);
}

You can add more harmful tags to the regular expression string if you'd like.

Enjoy.

Try the demo

Tag:

Add to Del.icio.us | Digg | Reddit | Furl

Bookmark WebProNews:

Mads Kristensen currently works as a Senior Developer at Traceworks located
in Copenhagen, Denmark. Mads graduated from Copenhagen Technical Academy with a multimedia degree in
2003, but has been a professional developer since 2000. His main focus is on ASP.NET but is responsible for Winforms, Windows- and
web services in his daily work as well. A true .NET developer with great passion for the simple solution.

http://www.madskristensen.dk/

News Tags: Digg, ASP.NET, Delicious, HTML, Tags, Reddit
About the author:
Mads Kristensen currently works as a Senior Developer at Traceworks located in Copenhagen, Denmark. Mads graduated from Copenhagen Technical Academy with a multimedia degree in 2003, but has been a professional developer since 2000. His main focus is on ASP.NET but is responsible for Winforms, Windows- and web services in his daily work as well. A true .NET developer with great passion for the simple solution.

http://www.madskristensen.dk/

ASP.NET: Remove HTML Tags From a String

i dont  know if you realised but using your HTML remover also makes the whole text lowercase, i can see why you have used it, to remove uppercase and lowercase < br >'s  but it outputs the text as lowercase...

Just a note..

Kind Regards,

Derek Hyams

Good job Mads

Good job Mads

Publish A Comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
4 + 8 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
SEARCH
Popular WPN Business Resources












Subscribe to WebProNews


Send me relevant info