Crawler doesn't generate anchor in record urls #1282

ArthurFlag · 2022-01-24T17:14:58Z

Description

Hi,
I'm indexing this website based on Docusaurus 2 and Redoc for the API docs. Since I moved to the new infra, my search hits for API docs are leading to the correct page, without taking me to the correct header.

Steps to reproduce

Go to this website
Search for Get user
The first search result's link is https://docs.talon.one/management-api/ which is not as accurate as it should be, it should be https://docs.talon.one/management-api/#operation/getUser.

The record doesn't contain the full url:

Record #20 

url: "https://docs.talon.one/management-api/"
url_without_variables: "https://docs.talon.one/management-api/"
url_without_anchor: "https://docs.talon.one/management-api/"  <----
anchor: ""
content: null
content_camel: null
lang: "en"
language: "en"
type: "lvl2"
...

My action looks like this:

    {
      indexName: "talon",
      pathsToMatch: ["https://docs.talon.one/management-api/"],
      recordExtractor: ({ $, helpers }) => {
        return helpers.docsearch({
          recordProps: {
            lvl0: {
              selectors: "",
              defaultValue: "Management API reference docs",
            },
            lvl1: ".api-content h1",
            lvl2: ".api-content h2",
            lvl3: ".api-content h3",
            lvl4: ".api-content h4",
            content: ".api-content h2 + div p",
          },
          indexHeadings: {from:1,to:3}
        });
      },
    },

The HTML to crawl look like this, so H2 has all the info the Crawler would need, I suppose.

What am I doing wrong 🙁 ?

The text was updated successfully, but these errors were encountered:

ArthurFlag · 2022-01-26T16:05:06Z

Sorry to ping you directly @shortcuts, but any clue?

shortcuts · 2022-01-26T16:10:14Z

Hey,

(I'm off this week so not able to look deeply/no computer)

We changed our way to detect anchors, could you show a snippet of an anchor in your DOM?

It's possible that your text is not a children of your anchor (sibling or other level), so we can't find it

Edit: just saw the snippet, it's definitely this. We need to look into it.

philipvollet · 2022-01-27T06:36:32Z

Also here for the same problem!

philipvollet · 2022-02-04T09:48:31Z

Any update here?

shortcuts · 2022-02-04T09:49:54Z

Hey, all opened issue will be reviewed next week

axilleas · 2022-02-04T13:41:50Z

👋 just chiming it to say that it would be nice to have some docs about:

We changed our way to detect anchors

🙂

shortcuts · 2022-02-09T18:00:40Z

Update: we found the issue and will do a bit more of testing before pushing it to prod

@axilleas no docs needed for that case, it's a mistake 😬

shortcuts · 2022-02-14T14:45:00Z

Hey there,

A fix has been deployed, feel free to test (start a new crawl) and give feedbacks :D

philipvollet · 2022-02-15T15:03:32Z

For us, it's still not working? Do we need to recreate the index or something else?

shortcuts · 2022-02-15T15:04:12Z

Hey, what is your appId?

philipvollet · 2022-02-15T15:09:23Z

Y1LB128RON

shortcuts · 2022-02-15T15:12:07Z

No crawl was started for Y1LB128RON after #1282 (comment), you need to start a new one in order to see the fix applied.

ArthurFlag · 2022-02-16T15:44:47Z

Works for us, thanks @shortcuts ❤️

shortcuts · 2022-02-16T15:46:03Z

Cool! Seems to be fixed for everyone then, please feel free to let us know if you encounter any issue with the new indexing!

shortcuts added the Investigation in progress This issue is being investigated label Feb 9, 2022

shortcuts added Fix deployed and removed Investigation in progress This issue is being investigated labels Feb 14, 2022

shortcuts closed this as completed Feb 16, 2022

bojanrajh mentioned this issue Mar 21, 2023

Anchors are being stripped out (using sitemaps, linkExtractor and externalData) #1831

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crawler doesn't generate anchor in record urls #1282

Crawler doesn't generate anchor in record urls #1282

ArthurFlag commented Jan 24, 2022

ArthurFlag commented Jan 26, 2022

shortcuts commented Jan 26, 2022 •

edited

Loading

philipvollet commented Jan 27, 2022

philipvollet commented Feb 4, 2022

shortcuts commented Feb 4, 2022

axilleas commented Feb 4, 2022

shortcuts commented Feb 9, 2022

shortcuts commented Feb 14, 2022 •

edited

Loading

philipvollet commented Feb 15, 2022

shortcuts commented Feb 15, 2022

philipvollet commented Feb 15, 2022

shortcuts commented Feb 15, 2022

ArthurFlag commented Feb 16, 2022

shortcuts commented Feb 16, 2022

Crawler doesn't generate anchor in record urls #1282

Crawler doesn't generate anchor in record urls #1282

Comments

ArthurFlag commented Jan 24, 2022

Description

Steps to reproduce

ArthurFlag commented Jan 26, 2022

shortcuts commented Jan 26, 2022 • edited Loading

philipvollet commented Jan 27, 2022

philipvollet commented Feb 4, 2022

shortcuts commented Feb 4, 2022

axilleas commented Feb 4, 2022

shortcuts commented Feb 9, 2022

shortcuts commented Feb 14, 2022 • edited Loading

philipvollet commented Feb 15, 2022

shortcuts commented Feb 15, 2022

philipvollet commented Feb 15, 2022

shortcuts commented Feb 15, 2022

ArthurFlag commented Feb 16, 2022

shortcuts commented Feb 16, 2022

shortcuts commented Jan 26, 2022 •

edited

Loading

shortcuts commented Feb 14, 2022 •

edited

Loading