Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawler doesn't generate anchor in record urls #1282

Closed
ArthurFlag opened this issue Jan 24, 2022 · 14 comments
Closed

Crawler doesn't generate anchor in record urls #1282

ArthurFlag opened this issue Jan 24, 2022 · 14 comments

Comments

@ArthurFlag
Copy link

Description

Hi,
I'm indexing this website based on Docusaurus 2 and Redoc for the API docs. Since I moved to the new infra, my search hits for API docs are leading to the correct page, without taking me to the correct header.

Steps to reproduce

  1. Go to this website
  2. Search for Get user
  3. The first search result's link is https://docs.talon.one/management-api/ which is not as accurate as it should be, it should be https://docs.talon.one/management-api/#operation/getUser.

The record doesn't contain the full url:

Record #20 

url: "https://docs.talon.one/management-api/"
url_without_variables: "https://docs.talon.one/management-api/"
url_without_anchor: "https://docs.talon.one/management-api/"  <----
anchor: ""
content: null
content_camel: null
lang: "en"
language: "en"
type: "lvl2"
...

My action looks like this:

    {
      indexName: "talon",
      pathsToMatch: ["https://docs.talon.one/management-api/"],
      recordExtractor: ({ $, helpers }) => {
        return helpers.docsearch({
          recordProps: {
            lvl0: {
              selectors: "",
              defaultValue: "Management API reference docs",
            },
            lvl1: ".api-content h1",
            lvl2: ".api-content h2",
            lvl3: ".api-content h3",
            lvl4: ".api-content h4",
            content: ".api-content h2 + div p",
          },
          indexHeadings: {from:1,to:3}
        });
      },
    },

The HTML to crawl look like this, so H2 has all the info the Crawler would need, I suppose.

Screenshot 2022-01-24 at 18 11 03

What am I doing wrong 🙁 ?

@ArthurFlag
Copy link
Author

Sorry to ping you directly @shortcuts, but any clue?

@shortcuts
Copy link
Member

shortcuts commented Jan 26, 2022

Hey,

(I'm off this week so not able to look deeply/no computer)

We changed our way to detect anchors, could you show a snippet of an anchor in your DOM?

It's possible that your text is not a children of your anchor (sibling or other level), so we can't find it

Edit: just saw the snippet, it's definitely this. We need to look into it.

@philipvollet
Copy link

Also here for the same problem!

@philipvollet
Copy link

Any update here?

@shortcuts
Copy link
Member

Hey, all opened issue will be reviewed next week

@axilleas
Copy link

axilleas commented Feb 4, 2022

👋 just chiming it to say that it would be nice to have some docs about:

We changed our way to detect anchors

🙂

@shortcuts shortcuts added the Investigation in progress This issue is being investigated label Feb 9, 2022
@shortcuts
Copy link
Member

Update: we found the issue and will do a bit more of testing before pushing it to prod

@axilleas no docs needed for that case, it's a mistake 😬

@shortcuts
Copy link
Member

shortcuts commented Feb 14, 2022

Hey there,

A fix has been deployed, feel free to test (start a new crawl) and give feedbacks :D

@shortcuts shortcuts added Fix deployed and removed Investigation in progress This issue is being investigated labels Feb 14, 2022
@philipvollet
Copy link

For us, it's still not working? Do we need to recreate the index or something else?

@shortcuts
Copy link
Member

Hey, what is your appId?

@philipvollet
Copy link

Y1LB128RON

@shortcuts
Copy link
Member

No crawl was started for Y1LB128RON after #1282 (comment), you need to start a new one in order to see the fix applied.

@ArthurFlag
Copy link
Author

Works for us, thanks @shortcuts ❤️

@shortcuts
Copy link
Member

Cool! Seems to be fixed for everyone then, please feel free to let us know if you encounter any issue with the new indexing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants