The
solution is quite simple:
1.
Retrieve all the HTML tags using this pattern: <(.|\n)*?>
2.
Replace them with an empty string and return the result
Here's a
C# function that does this:
private string StripHTML(string htmlString)
{
//This pattern Matches everything found inside html tags;
//(.|\n) - > Look for any character or a new line
// *? -> 0 or more occurences, and make a non-greedy
search meaning
//That the match will stop at the first available '>' it sees,
and not at the last one
//(if it stopped at the last one we could have overlooked
//nested HTML tags inside a bigger HTML tag..)
string pattern
= @"<(.|\n)*?>";
return Regex.Replace(htmlString,pattern,string.Empty);
}
Or with
just one line of code:
string stripped = Regex.Replace(textBox1.Text,@"<(.|\n)*?>",string.Empty);