-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Post Featured Image: Add a useFirstImageFromPost attribute #56573
Conversation
Size Change: +184 B (0%) Total Size: 1.69 MB
ℹ️ View Unchanged
|
if ( $attributes['useFirstImageFromPost'] && ! $featured_image ) { | ||
$content_post = get_post( $post_ID ); | ||
$content = $content_post->post_content; | ||
$processor = new WP_HTML_Tag_Processor( $content ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the peculiar use of the Tag Processor here I think it warrants an explanatory comment. something indicating what the goal is.
/*
* Transfer the image tag from the post into a new text snippet.
* Because the HTML API doesn't currently expose a way to extract
* HTML substrings this is necessary as a workaround. Of note, this
* is different than directly extracting the IMG tag:
* - If there are duplicate attributes in the source there will only be one in the output.
* - If there are single-quoted or unquoted attributes they will be double-quoted in the output.
* - If there are named character references in the attribute values they may be replaced with their direct code points. E.g. `…` becomes `…`.
*
* In the future there will likely be a mechanism to copy snippets of HTML from
* one document into another, via the HTML Processor's `get_outer_html()` or
* equivalent. When that happens it would be appropriate to replace this custom
* code with that canonical code.
*/
For one off projects I wouldn't be so worried, but I'd like all Core code to be clear about what is exemplary and should be copied vs. what is not meant to serve as a canonical example of use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in a161d4d
5ea0213
to
a2ebf5c
Compare
Flaky tests detected in dc3d0e4. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/7293774380
|
useFirstImageFromPost && | ||
imageBlock?.attributes?.id | ||
) { | ||
featuredImage = imageBlock.attributes.id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should be using another useState()
value here shouldn't we? otherwise I think we might accidentally wipe out the updated value we have.
const [ storedFeaturedImage ] = useEntityProps( … );
const featuredImage = useMemo(
() => {
if ( storedFeaturedImage ) {
return storedFeaturedImage;
}
if ( ! useFirstImageFromPost ) {
return;
}
const blocks = parse( postContent )
const imageBlock = …
return imageBlock?.attributes?.id;
},
[ storedFeaturedImage, postContent ]
);
postId | ||
); | ||
const blocks = parse( postContent ); | ||
const imageBlock = blocks.find( ( block ) => block.name === 'core/image' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be a reasonable place to trim how much work is going on to avoid adding typing lag. first of all we can hide it inside a useMemo()
call so that we don't recalculate it if the post hasn't changed since the last invocation. secondly, we can trim the post at the first image block closer, which is relatively easy to find,.
const firstImageCloser = /<!--\s+\/wp:(?:core/)?image\s+\/-->/.exec( postContent );
const content = firstImageCloser
? postContent.slice( 0, firstImageCloser.index + firstImageCloser[0].length )
: '';
const blocks = parse( content );
const imageBlock = blocks.find( ( { name } ) => name === 'core/image' );
for large posts this has the potential to significantly reduce the time and memory spent in finding the first image. we could also use the block tokenizer regex pattern (modified here) to also find the image opener and then we'd only need to parse the specific image block, but I think if we are looking for a first image, this will likely work well in most cases. it's easy enough to expand this though into a findFirstImageInPost()
function that would do everything quickly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. I think the regex should be /<!--\s+\/wp:(?:core\/)?image\s+-->/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call on that missing \
, thank you.
Yeah @ockham and I were working on a similar problem and I think we may have some more convenient ways to do this later, but for now I think this will help a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're already going for a regex to find the image, may as well grab just the image instead of the whole post up until the image, right?
@dmsnell I expect you're more well versed in regex performance and block parsing than I am, but I believe this should do the trick. I grabbed the first part from the parser and simplified it for the image block.
<!--\s+wp:(?:core\/)?image\s+(?:{(?:(?:[^}]+|}+(?=})|(?!}\s+\/?-->).)*+)?}\s+)?-->(?s:.*)<!--\s+\/wp:(?:core\/)?image\s+-->
The entire match can be passed into the parser rather than slicing the post content. The regex is slightly more complicated, but the rest of the code gets slightly simpler.
This would be more efficient for all cases except for ones where the image is the first block on the page, I think, since the parser has to run the opening matcher regex anyway for every block when parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, since we only care about the id
, we probably don't even need to parse
the entire block. The id
, if it exists, is always in the JSON attributes—it isn't sourced from the content.
const imageOpener = /<!--\s+wp:(?:core\/)?image\s+(?<attrs>{(?:(?:[^}]+|}+(?=})|(?!}\s+\/?-->).)*)?}\s+)?-->/.exec( postContent );
const imageId = imageOpener?.groups?.attrs && JSON.parse( imageOpener.groups.attrs )?.id;
I think something like this should work, but I didn't try running it.
I forgot JS RegEx doesn't support the possessive qualifier used in the last example, but I don't think that's a problem if we have valid JSON, so I removed it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajlende great questions and proposals 👏
the reason I did choose the closing tag was ease of the regex pattern and a "good enough" spirit compared to parsing the entire post. yours is better, but if we're trying to optimize further I'd recommend these as well:
- stay as close to the block parser regex patterns as possible
- don't combine the regexes. find the block opener, and if necessary, start at the offset following the opener and then find the closer.
- if all we need is the
id
and we're going to discard the imagesrc
then that seems fitting. the JS regex pattern found in the block default parser should be adequate; is it failing here?
all in all this is so much better performance wise than parsing the entire post. I hope that in the next couple releases we'll have a lazy block parser that mostly eliminates all the performance constraints that lead us to write specialized code like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if all we need is the id and we're going to discard the image src then that seems fitting. the JS regex pattern found in the block default parser should be adequate; is it failing here?
Maybe we can discard the check altogether and just use the block parser then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's probably still worth not using the block parser, though that would be fine. in this particular case we're doing a lot of work without the need to. my comment about the lazy parser is there to show where we're going, but it's not ready yet, and that's still in PHP. sometimes I get confused thinking about what's on the server vs. what's in the browser.
what I was trying to convey mostly is that we can reuse the regex pattern from the block parser to reliably get what we want, and if we do that, we get a lot of performance gain for almost no additional effort or complexity. if we start diverging from that regex and write a reasonable amount of custom code then we're diverging and making things harder for very little marginal gain over the most-basic optimization.,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank for the reviews @richtabor and @dmsnell, I have addressed the points you raised. This is ready for another look :) |
85161be
to
47ba584
Compare
dc3d0e4
to
9ba8fb8
Compare
I updated this to use the regex that @ajlende wrote. If this is working as expected please can we merge this and then follow up with more performance improvements in another PR? |
9ba8fb8
to
df6a00c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's give it a try! I think this will be a nice update.
What?
This adds an option called "Use first image from post". When this option is enabled the Post Featured Image block will pull the first image from the post.
Why?
For users with a lot of old posts without featured images this is a useful way to display images in the query block without having to manually add many featured images.
How?
useFirstImageFromPost
.parse
.Testing Instructions
Screenshots or screencast
We decided not to expose this setting in the UI for now. Since this is a more advanced feature, we are happy for now to only expose it through an attribute that theme builders can add via the code.