-
-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nokogiri 1.5.0 on libxml2 2.7.8 reading HTML line numbers as "0" #613
Comments
Hello! Thanks for reporting this. I'm unable to reproduce it with:
So perhaps this is a problem either specific to your HTML file (can you provide it?) or your version of 1.9.2 (you have p0, I have p290) (can you upgrade it?). |
Thanks for the response! Unfortunately, upgrading our version of Ruby at this time isn't really an option–all of our code has been built against p0, and we won't be upgrading it for a while probably. Here is the html file: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta name="viewport" content="width=device-width; height=device-height, initial-scale=1.0; maximum-scale=1.0; user-scalable=0;" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Dockers</title>
</head>
<body class="fullpage-vert" onunload="javascript:clearInterval(audioLoop);">
<div id="container">
<div id="danceHolder">
<img id="danceVid" src="1-1.jpg" width="320" height="480" alt="" />
</div>
<div id="introHolder">
<img id="introVid" src="0-1.jpg" width="320" height="480" alt="" />
<div id="ctabg"></div>
<div id="cta1"></div>
<div id="cta2"></div>
<div id="cta3"></div>
<div id="phone"></div>
<div id="logo"></div>
</div>
</div>
</body>
</html> |
Well, I didn't mean "upgrade your production servers", I meant "can you try this on your dev machine with a different version of ruby". I'm trying to isolate what the cause could be, and as I mentioned before, we differ on the patchlevel of ruby we're running. The HTML you included above doesn't appear to match well with the ruby script you included in the original post, since there are no "a" elements in it. That said, if I change the script to search for "div", I see line numbers appropriately, so we're left with either: a) it has something to do with the version of Ruby you're on, or Please let me know if you're able to reproduce with a newer version of 1.9.2! |
Closing, pending more information from the original reporter. |
@jeremy Can you expand a bit on why you're linking that PR to this issue? They're both discussing line numbers, but aren't directly related in a causal or solution-y way. |
My bad. They naively appeared related in both cause and solution. |
Attempting to parse HTML files on CentOS5, running ruby 1.9.2, nokogiri 1.5.0, and libxml2 2.7.8.
Parsing a file with syntax like this:
results in "0" for every line number. If I instead parse it as xml:
the line numbers will be displayed correctly. I know there was a previous issue with libxml2 2.7.3, and I also know that CentOS comes with libxml2 2.6.2. However, I've followed the tutorial for installation on the site, and built Nokogiri against libxml2 2.7.8. Here's my nokogiri -v output:
I do still technically have libxml 2.6.2 installed on the system via yum, but it doesn't look like it's affected the nokogiri build. Is there some other step I should be using?
As an aside, if I must end up using Nokogiri::XML to parse the html, will it work with HTML4 and HTML5 documents, as well as XHTML?
Thanks.
The text was updated successfully, but these errors were encountered: