Thursday, June 14, 2012

Find all links inside a HTML source using Google RE2

Here is a small code to show you how to find all links inside a HTML source code using Google's Regular Expression library (RE2) :
-----------------------------------------------------------------
string text=" html source goes here <a href='http://google.com/top-l/?search=xyz&x=t'>site link</a> testing <a   href= '../news/show?id=4'>ms</a>";
 re2::RE2 linksre("<a[\s\w]*href=\s*'([\w:/{1,2}.\?=\-&%]*)'[\s\w]*>");
string res;
re2::StringPiece html(text);
while(RE2::FindAndConsume(&html, linksre ,&res))
{
cout << "("<<res<<")"<<endl;
}
-----------------------------------------------------------------

output is :
(http://google.com/top-l/?search=xyz&x=t)
(../news/show?id=4)

No comments :

Post a Comment

Thursday, June 14, 2012

Find all links inside a HTML source using Google RE2

Here is a small code to show you how to find all links inside a HTML source code using Google's Regular Expression library (RE2) :
-----------------------------------------------------------------
string text=" html source goes here <a href='http://google.com/top-l/?search=xyz&x=t'>site link</a> testing <a   href= '../news/show?id=4'>ms</a>";
 re2::RE2 linksre("<a[\s\w]*href=\s*'([\w:/{1,2}.\?=\-&%]*)'[\s\w]*>");
string res;
re2::StringPiece html(text);
while(RE2::FindAndConsume(&html, linksre ,&res))
{
cout << "("<<res<<")"<<endl;
}
-----------------------------------------------------------------

output is :
(http://google.com/top-l/?search=xyz&x=t)
(../news/show?id=4)

No comments :

Post a Comment