Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Form elements not updated when making DOM changes #2140

Closed
aditsu opened this issue May 30, 2024 · 5 comments
Closed

Form elements not updated when making DOM changes #2140

aditsu opened this issue May 30, 2024 · 5 comments
Assignees
Milestone

Comments

@aditsu
Copy link

aditsu commented May 30, 2024

I am parsing an html document and making some changes to it, then extracting some form fields. It looks like the elements in FormElement are already set when the document is initially parsed, and remain unchanged.
Example code:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.FormElement;

public class ParseTest {
	public static void main(final String... args) {
		final String html = "<html><body><form><div id=d1><input id=foo></div><input id=bar></form></body></html>";
		final Document doc = Jsoup.parse(html);
		doc.select("#d1").remove();
		final FormElement form = (FormElement) doc.selectFirst("form");
		form.appendElement("select").attr("id", "baz");
		System.out.println(form);
		System.out.println();
		System.out.println(form.elements());
	}
}

The result shows that the form now has the bar and baz fields, but its elements are still foo and bar. Notice that foo was removed even before selecting the form element.

Workaround (re-parsing): form = (FormElement) Jsoup.parse(form.toString()).selectFirst("form");
Jsoup version: 1.17.2

@aditsu
Copy link
Author

aditsu commented May 30, 2024

Possible solution:

  • make elements non-final
  • when making any DOM change, traverse the ancestors of affected nodes and set any FormElement's elements to null (affects performance?)
  • update the elements() method to reconstruct the elements field if null
  • update any other internal uses of elements to check for null or call elements() instead

OR...

...just add a method to reconstruct the elements of a form (will have to be called explicitly)

@jhy
Copy link
Owner

jhy commented Jul 1, 2024

Yes, the form control elements in FormElement.elements() are specifically associated during the parse; they are not a runtime property.

I think if they are required, a reconstruction method would be the most appropriate. I would be happy to review a PR if you want to add it.

But, can you tell me what you are actually using those elements for? There may be a simpler solution, or a more ergonomic change we can make.

@aditsu
Copy link
Author

aditsu commented Jul 8, 2024

But, can you tell me what you are actually using those elements for? There may be a simpler solution, or a more ergonomic change we can make.

For making automated HTTP requests to a website (as in submitting a form). I'm parsing all the elements as they come and just setting the values I need. Some forms are modified dynamically in the browser and I need to mirror those modifications.

@jhy
Copy link
Owner

jhy commented Jul 15, 2024

Thanks for confirming. That's definitely the main point of the FormElement, so we should make sure its use is ergonomic.

The main reason that we have the linking running at parse time is so that form element children that are moved out from the containing form tag during the parse are still associated with the form. But that doesn't mean that elements added dynamically ought not be associated as well.

Some rough thoughts:

Associating the new elements with the form automatically would be convenient, but may have a bit of an overhead and a low hit ratio. Linking it to the retrieval of the form would be ideal, but in your example it's just a plain cast so we don't have that opportunity. We do if the user is calling it via Elements#forms().

Maybe we can add a Element#asForm() method that runs a re-connect and simplifies the cast.

But in review of the code and use. I think it's best to reimplement FormElement#elements() to rebuild on demand and include the existing elements plus any others found as children.

@jhy jhy self-assigned this Jul 15, 2024
@jhy jhy added this to the 1.18.2 milestone Jul 15, 2024
@jhy jhy closed this as completed in 56a09ca Jul 15, 2024
@jhy
Copy link
Owner

jhy commented Jul 15, 2024

Thanks, done. @aditsu it would be great if you could review and test (do a mvn install from head), and LMK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants