Welcome to dbForumz.com!
FAQFAQ      ProfileProfile    Private MessagesPrivate Messages   Log inLog in

using PHP to parse through HTML

 
   Database Forums (Home) -> PHP RSS
Next:  XML validation in PHP 4.3.10  
Author Message
laredotornado

External


Since: Feb 19, 2005
Posts: 2



(Msg. 1) Posted: Sat Feb 19, 2005 12:49 pm
Post subject: using PHP to parse through HTML
Archived from groups: comp>lang>php (more info?)

Hi, I'm using PHP 4 and trying to parse through HTML to look for HREF
attributes of anchor tags and SRC attributes of IMG tags. Does anyone
know of any libraries/freeware to help parse through HTML to find these
things. Right now, I'm doing a lot of "strstr" calls, but there is
probably a better way to do what I need.

Thanks for any help, - Dave

 >> Stay informed about: using PHP to parse through HTML 
Back to top
Login to vote
Andy Hassall

External


Since: Jan 11, 2004
Posts: 465



(Msg. 2) Posted: Sat Feb 19, 2005 9:02 pm
Post subject: Re: using PHP to parse through HTML [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On 19 Feb 2005 11:49:24 -0800, laredotornado DeleteThis @gmail.com wrote:

 >Hi, I'm using PHP 4 and trying to parse through HTML to look for HREF
 >attributes of anchor tags and SRC attributes of IMG tags. Does anyone
 >know of any libraries/freeware to help parse through HTML to find these
 >things. Right now, I'm doing a lot of "strstr" calls, but there is
 >probably a better way to do what I need.

Haven't used it myself, but seen mentions of:

<a style='text-decoration: underline;' href="http://pear.php.net/package/XML_HTMLSax" target="_blank">http://pear.php.net/package/XML_HTMLSax</a>

... which looks possibly suitable from the description on the page.

--
Andy Hassall / <andy DeleteThis @andyh.co.uk> / <http://www.andyh.co.uk>
<http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool<!-- ~MESSAGE_AFTER~ -->

 >> Stay informed about: using PHP to parse through HTML 
Back to top
Login to vote
laredotornado

External


Since: Dec 18, 2004
Posts: 16



(Msg. 3) Posted: Sat Feb 19, 2005 9:22 pm
Post subject: Re: using PHP to parse through HTML [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Too bad none of the examples work. I untarred/uncompressed the file,
copied the folder to a public html directory and then every time I try
and launch an example, I get errors like

Warning: main(XML/HTMLSax/XML_HTMLSax_States.php): failed to open
stream: No such file or directory in
/usr/local/apache/htdocs/temp/XML/XML_HTMLSax.php on line 36

Fatal error: main(): Failed opening required
'XML/HTMLSax/XML_HTMLSax_States.php'
(include_path='.:/usr/local/lib/php') in
/usr/local/apache/htdocs/temp/XML/XML_HTMLSax.php on line 36


Andy Hassall wrote:
 > On 19 Feb 2005 11:49:24 -0800, laredotornado RemoveThis @gmail.com wrote:
 >
  > >Hi, I'm using PHP 4 and trying to parse through HTML to look for
HREF
  > >attributes of anchor tags and SRC attributes of IMG tags. Does
anyone
  > >know of any libraries/freeware to help parse through HTML to find
these
  > >things. Right now, I'm doing a lot of "strstr" calls, but there is
  > >probably a better way to do what I need.
 >
 > Haven't used it myself, but seen mentions of:
 >
<font color=purple> > <a style='text-decoration: underline;' href="http://pear.php.net/package/XML_HTMLSax</font" target="_blank">http://pear.php.net/package/XML_HTMLSax</font</a>>
 >
 > ... which looks possibly suitable from the description on the page.
 >
 > --
 > Andy Hassall / <andy RemoveThis @andyh.co.uk> / <http://www.andyh.co.uk>
 > <http://www.andyhsoftware.co.uk/space> Space: disk usage analysis
tool<!-- ~MESSAGE_AFTER~ -->
 >> Stay informed about: using PHP to parse through HTML 
Back to top
Login to vote
Dave Patton1

External


Since: Sep 18, 2004
Posts: 1



(Msg. 4) Posted: Sun Feb 20, 2005 12:17 am
Post subject: Re: using PHP to parse through HTML [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

laredotornado RemoveThis @gmail.com wrote in
news:1108842564.846225.81750@c13g2000cwb.googlegroups.com:

 > Hi, I'm using PHP 4 and trying to parse through HTML to look for HREF
 > attributes of anchor tags and SRC attributes of IMG tags. Does anyone
 > know of any libraries/freeware to help parse through HTML to find these
 > things. Right now, I'm doing a lot of "strstr" calls, but there is
 > probably a better way to do what I need.

Take a look at preg_split()
<a style='text-decoration: underline;' href="http://www.php.net/manual/en/function.preg-split.php" target="_blank">http://www.php.net/manual/en/function.preg-split.php</a>

--
Dave Patton
Canadian Coordinator, Degree Confluence Project
<a style='text-decoration: underline;' href="http://www.confluence.org/" target="_blank">http://www.confluence.org/</a>
My website: <a style='text-decoration: underline;' href="http://members.shaw.ca/davepatton/" target="_blank">http://members.shaw.ca/davepatton/</a><!-- ~MESSAGE_AFTER~ -->
 >> Stay informed about: using PHP to parse through HTML 
Back to top
Login to vote
steve




Joined: Jan 06, 2004
Posts: 655



(Msg. 5) Posted: Sun Feb 20, 2005 12:51 am
Post subject: Re: using PHP to parse through HTML [Login to view extended thread Info.]

strstr is the LAST thing you want to do in this case! I don't know of libraries, but you can use preg_match to grab the tags that you need.

If you are into php, learning preg_match and regular expressions in general is almost a must.. it will substantially increase the power of your code.

steve
 >> Stay informed about: using PHP to parse through HTML 
Back to top
Login to vote
Simon

External


Since: Feb 20, 2005
Posts: 1



(Msg. 6) Posted: Sun Feb 20, 2005 7:40 am
Post subject: Re: using PHP to parse through HTML [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

 >
 > strstr is the LAST thing you want to do in this case! I don't know
 > of libraries, but you can use preg_match to grab the tags that you
 > need.
 >
 > If you are into php, learning preg_match and regular expressions in
 > general is almost a must.. it will substantially increase the power
 > of your code.
 >
 > steve
 >
 > --


Sorry can you elaborate on you first statement.
Are you saying that "strstr" is slower that "preg_match"? what about
"strpos"?

The reason I ask is, if it was faster to look for a character in string
using "preg_match" then why wouldn't strpos/strstr us it themselves?

I need to look for 2 characters in some data, (case sensitive), what would
be the fastest way of finding the first occurrence?

$first = strpos( $data, $charA );
$sec = strpos( $data, $charB );
// check for ===false;
return ($first<$sec)?$first:$sec;

// would there be a faster way to achieve the above using "preg_match"?

Simon<!-- ~MESSAGE_AFTER~ -->
 >> Stay informed about: using PHP to parse through HTML 
Back to top
Login to vote
Andy Hassall

External


Since: Jan 11, 2004
Posts: 465



(Msg. 7) Posted: Sun Feb 20, 2005 9:40 am
Post subject: Re: using PHP to parse through HTML [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On 19 Feb 2005 20:22:22 -0800, laredotornado DeleteThis @zipmail.com wrote:

 >Andy Hassall wrote:
  >> On 19 Feb 2005 11:49:24 -0800, laredotornado DeleteThis @gmail.com wrote:
  >>
   >> >Hi, I'm using PHP 4 and trying to parse through HTML to look for
 >HREF
   >> >attributes of anchor tags and SRC attributes of IMG tags. Does
 >anyone
   >> >know of any libraries/freeware to help parse through HTML to find
 >these
   >> >things. Right now, I'm doing a lot of "strstr" calls, but there is
   >> >probably a better way to do what I need.
  >>
  >> Haven't used it myself, but seen mentions of:
  >>
<font color=green>  >> <a style='text-decoration: underline;' href="http://pear.php.net/package/XML_HTMLSax</font" target="_blank">http://pear.php.net/package/XML_HTMLSax</font</a>>
  >>
  >> ... which looks possibly suitable from the description on the page.
 >
 >Too bad none of the examples work. I untarred/uncompressed the file,
 >copied the folder to a public html directory

That's not how you're supposed to install PEAR modules; here's an example how:

root@server:~# pear install <a style='text-decoration: underline;' href="http://pear.php.net/get/XML_HTMLSax-2.1.2.tgz" target="_blank">http://pear.php.net/get/XML_HTMLSax-2.1.2.tgz</a>
downloading XML_HTMLSax-2.1.2.tgz ...
Starting to download XML_HTMLSax-2.1.2.tgz (16,099 bytes)
.......done: 16,099 bytes
install ok: XML_HTMLSax 2.1.2

You could probably get away with unpacking to a public_html directory but
you'd need to fiddle with your include_path else you get errors like:

 >Warning: main(XML/HTMLSax/XML_HTMLSax_States.php): failed to open
 >stream: No such file or directory in
 >/usr/local/apache/htdocs/temp/XML/XML_HTMLSax.php on line 36

The examples work OK for me after installing through pear as above.

--
Andy Hassall / <andy DeleteThis @andyh.co.uk> / <http://www.andyh.co.uk>
<http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool<!-- ~MESSAGE_AFTER~ -->
 >> Stay informed about: using PHP to parse through HTML 
Back to top
Login to vote
petrovitch

External


Since: Feb 08, 2005
Posts: 1



(Msg. 8) Posted: Sun Feb 20, 2005 9:58 am
Post subject: Re: using PHP to parse through HTML [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Back to top
Login to vote
steve




Joined: Jan 06, 2004
Posts: 655



(Msg. 9) Posted: Sun Feb 20, 2005 6:52 pm
Post subject: Re: using PHP to parse through HTML [Login to view extended thread Info.]

Simon wrote:
>
> strstr is the LAST thing you want to do in this case! I don't know
> of libraries, but you can use preg_match to grab the tags that you
> need.
>
> If you are into php, learning preg_match and regular expressions in
> general is almost a must.. it will substantially increase the power
> of your code.
>
> steve
>
> --


Sorry can you elaborate on you first statement.
Are you saying that "strstr" is slower that "preg_match"? what about
"strpos"?

The reason I ask is, if it was faster to look for a character in string
using "preg_match" then why wouldn't strpos/strstr us it themselves?

I need to look for 2 characters in some data, (case sensitive), what would
be the fastest way of finding the first occurrence?

$first = strpos( $data, $charA );
$sec = strpos( $data, $charB );
// check for ===false;
return ($first<$sec)?$first:$sec;

// would there be a faster way to achieve the above using "preg_match"?

Simon


Simon, in 99% of the cases, speed does not matter, i.e. you can achieve good speed regardless --not something I have ever had to worry about in the code. The point is that with preg_match and regex, you can achieve with one statement what it takes 10 statement to achive, if you did not have regex. If you ever parse free text in any shape or form, regex is the way to go. Your example above is simple and if that is all you need fine, but as soon as the text has spurious (sp?) spaces, other characters that may or may not be present, and a whole bunch of other conditions outside your control, you need a much more powerful engine, and that is regex.
 >> Stay informed about: using PHP to parse through HTML 
Back to top
Login to vote
Display posts from previous:   
Related Topics:
Parse error - Hi, What's wrong with this line? I'm getting parse errors for it: if ( empty($payment) || !is_numeric($payment) ) {

parse XML into mysql - Hi i have an XML like this: <?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?> <parlast modified="20/06/2008 12:03"> <nodo name="test" nodoid="3893"> ...

Parse error: parse error, unexpected T_STRING, expecting '.. - I'm getting the following error on this line, (last, first, address) VALUES('$last', '$first', '$address' ) ") Parse error: parse error, unexpected T_STRING, expecting ',' or ';' in /home/xxxxxx/public_html/process1.php on line 22 I copied the co...

<<<HTML - Let us say I have lots of html-code I want to print.... surrounded by php code. I learned once that I could do like this: <?php $name = empty($_POST['name'] ? "" : $_POST['name']; print <<<HTML <form method="post" act...

PHP/HTML - Hope there is someone out there how can help me. I got a small problem, as you will properly see i'm a newbi. I'm running php 5.1.2 and apacheserver 2.0.55 on a winxp installation. The following code is not displaying anything in the browser: <?php...
   Database Forums (Home) -> PHP All times are: Pacific Time (US & Canada) (change)
Page 1 of 1

 
You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



[ Contact us | Terms of Service/Privacy Policy ]