Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not Found (404) Error with Service Discovery - Docker / Consul #1329

Closed
johha86 opened this issue Sep 1, 2020 · 15 comments · Fixed by #1670
Closed

Not Found (404) Error with Service Discovery - Docker / Consul #1329

johha86 opened this issue Sep 1, 2020 · 15 comments · Fixed by #1670
Assignees
Labels
bug Identified as a potential bug merged Issue has been merged to dev and is waiting for the next release Service Discovery Ocelot feature: Service Discovery

Comments

@johha86
Copy link

johha86 commented Sep 1, 2020

Expected Behavior

Reroute to the downstream path.

Actual Behavior

Any request return a Not Found Error(404) with the following message in the VS output window:

warn: Ocelot.Responder.Middleware.ResponderMiddleware[0]  requestId: 0HM2E7R13PE70:00000001, previousRequestId: no previous request id, message: Error Code: ServicesAreEmptyError Message: services were empty for SharepointAPI errors found in ResponderMiddleware. Setting error response for request path:/File, request method: GET

Ocelot.Responder.Middleware.ResponderMiddleware: Warning: requestId: 0HM2E7R13PE70:00000001, previousRequestId: no previous request id, message: Error Code: ServicesAreEmptyError Message: services were empty for SharepointAPI errors found in ResponderMiddleware. Setting error response for request path:/File, request method: GET

Steps to Reproduce the Problem

I created two ASP Net Core Web API projects. The first one implement a API Gateway using Ocelot with Consul and the second one is a simple Web API Service with a endpoint routed to /api/v1/File. Both project are initialized by a docker composer project

  1. Ocelot.json file:
"Routes": [
   {
     "DownstreamPathTemplate": "/api/v1/File",      
     "DownstreamScheme": "http",
     "UpstreamPathTemplate": "/File",
     "UpstreamHttpMethod": [ "Get" ],
     "ServiceName": "SharepointAPI",
     "LoadBalancerOptions": {
       "Type": "LeastConnection"
     }
   }
 ],
 "GlobalConfiguration": {
   "ServiceDiscoveryProvider": {
     "Scheme": "http",
     "Host": "consul",
     "Port": 8500,
     "Type": "PollConsul"
   }
 }
  1. Docker composer file
version: '3.4'

services:
    sharepoint.api:
       image: ${DOCKER_REGISTRY-}orderingapi
       hostname: sharepointapi
       build:
           context: .
           dockerfile: Sharepoint.API/Dockerfile
       ports:
           - "8002:80"

   private.gtw:
       image: ${DOCKER_REGISTRY-}privategtw
       build:
           context: .
           dockerfile: Private.Gtw/Dockerfile
      ports:
           - "7000:80"

   consul:
       image: consul:latest
       command: consul agent -dev -log-level=warn -ui -client=0.0.0.0 -bind='{{ GetPrivateIP }}'        
       hostname: consul
       ports:
           - "8500:8500"
  1. Startup file
        public void ConfigureServices(IServiceCollection services)
        {
            services
                .AddOcelot()
                .AddConsul(); 
        }

        public void Configure(IApplicationBuilder app, IWebHostEnvironment env)
        {
            if (env.IsDevelopment())
            {
                app.UseDeveloperExceptionPage();
            }

            app.UseRouting();

            app.UseEndpoints(endpoints =>
            {
                endpoints.MapControllers();
            });

            //ocelot
            app.UseOcelot().Wait();
        }
  1. Output from Consul Monitor
/ # consul monitor
2020-09-01T06:54:40.712Z [INFO]  agent: Deregistered service: service=SharepointAPI
2020-09-01T06:54:40.911Z [INFO]  agent: Synced service: service=SharepointAPI

Specifications

The service looks registered in the web client.
If I ping beetwen the consul container, the private gateway and the service container they can be reached each other.

  • Nuget Ocelot Version: 16.0.1
  • Nuget Ocelot.Provider.Consul Version: 16.0.1
  • Nuget Consul Version: 1.6.1.1
  • Platform: Net Core 3.1
  • Subsystem: Windows 10

How can I solve this problem?

@matjazbravc
Copy link

I have exactly the same problem. It's obvious that Ocelot does not resolve service routes properly with Consul when deployed to Docker.

@jlukawska
Copy link
Contributor

Have you tried to repeat the request? I also receive the 404 status code the first time when I use PollConsul type, but subsequent requests are handled well. And it is not dependent on the use of containers.

@LeeStevens318
Copy link

Have you tried to repeat the request? I also receive the 404 status code the first time when I use PollConsul type, but subsequent requests are handled well. And it is not dependent on the use of containers.

I also have this issue, have you figured out how to fix it?

@matjazbravc
Copy link

matjazbravc commented Nov 7, 2020 via email

@johha86
Copy link
Author

johha86 commented Dec 30, 2020

Hi @jlukawska , I found exactly what you suggest time ago. After clone the project and inspect what it does in the moment to request the collection of registered services in the first async call to the Consul API the result is null, but subsequent requests return a valid collection.
Then I tried to replicate a similar behavior in a simple project but only with Consul, and the first call to the Consul API everything was fine. So, I concluded that the problem isn't related to Consul and perform a double request at the first time isn't a solution to the issue.

@jlukawska
Copy link
Contributor

@johha86 , if I remember well the problem is that the first polling starts when application starts. And application starts with the first request and that's why there is no services information yet.

@3dotsDev
Copy link

I have the same problem ...first call on every route get me a 404 ... second call is fine....
i cant find a solution to fix this issue... can anyone help me ?

@johha86
Copy link
Author

johha86 commented Aug 14, 2021

Hi, I had found the reason of the problem and how to fixe it. When you configure Ocelot to use Service Discovery as PollConsul mode, an IServiceDiscoveryProvider is created with a timer in the Constructor. In the callback of the timer, happens the Poll of the available services in Consul into a collection. But the initialization of this timer happens at the first time you call Ocelot. So, the first time you call ocelot, such collection is empty and you receive a Not Found 404. The repository with this code is Archived so I can't do a PR with the Fix. What I did was do a Fork of the repository ,add the fix and use my forked project instead the Nuget Package.
This is how the constructor in my Forked project is:

public PollConsul(int pollingInterval, IOcelotLoggerFactory factory, IServiceDiscoveryProvider consulServiceDiscoveryProvider)
{
   _logger = factory.CreateLogger<PollConsul>();
   _consulServiceDiscoveryProvider = consulServiceDiscoveryProvider;
   _services = new List<Service>();

   //  Do a first Poll before the first Timer callback.
   Poll().Wait();

  _timer = new Timer(async x =>
   {
     if (_polling)
     {
        return;
     }
  
     _polling = true;
     await Poll();
     _polling = false;
   }, null, pollingInterval, pollingInterval);
}

Other option is implement an IHostedService that perform calls to an endpoint at the beginning of the execution of the application.

I will keep this issue open because the original code doesn't have a solution yet. Maybe I could maintain the original repository.

@3dotsDev
Copy link

hmmmm i see your point...

But im not sure... your update is running on the first startup... then it collects the services and hold them...

but i have the 404 on every routes first call... not only initial

@johha86
Copy link
Author

johha86 commented Aug 16, 2021

Yes, I know. The forked repo with solution is here. This contain the source code of Ocelot.Provider.Consul and I commited my changes into the branch issues-1329-not-found-error-with-consul.

@ggnaegi
Copy link
Member

ggnaegi commented Sep 10, 2021

@3dotsDev I'm experiencing the same issue. I'm currently reading the code and I'm trying to implement a suitable solution for me there.

We have the ConsulProviderFactory, it's a singleton. So I thought that carefully maintaining a list with the discovery providers (one discovery provider per service) would do the trick.

In the Polling class, I'm retrieving the clients the first time the Get method is called. The timer is also started there (using a semaphore slim to avoid race conditions).

I'm not sure it's a good solution (neverending debate about locks, concurrent dictionaries or semaphore slim).

@ggnaegi
Copy link
Member

ggnaegi commented Sep 15, 2021

@johha86 @3dotsDev
Hello

I'm wondering if this could be a solution...

The idea is the following:

"Avoiding polling" (using time intervals), retrieving the services until they are available on every request. As soon as the services are available, waiting until "polling interval" reached.

        public async Task<List<Service>> Get()
        {
            await _semaphore.WaitAsync();
            try
            {
                var refreshTime = _lastUpdateTime.AddMilliseconds(_pollingInterval);

                //checking elapsed time + if any service available
                if (refreshTime >= DateTime.UtcNow && _services.Any())
                {
                    return _services;
                }

                _logger.LogInformation($"Retrieving new client information for service: {ServiceName}");
                _services = await _consulServiceDiscoveryProvider.Get();
                _lastUpdateTime = DateTime.UtcNow;

                return _services;
            }
            finally
            {
                _semaphore.Release();
            }
        }

@johha86
Copy link
Author

johha86 commented Sep 15, 2021

Hello @ggnaegi

If you want to "Avoid polling" I think that you only need to setup Ocelot following the instruction:
"ServiceDiscoveryProvider": { "Scheme": "https", "Host": "localhost", "Port": 8500, "Type": "Consul" }
The type Consul retrieve the services until they are available on every request. You can found more details here.
The problem that I faced in this issue is when the Type PollConsul is used in the service discovery configuration

@ggnaegi
Copy link
Member

ggnaegi commented Sep 15, 2021

@johha86 sure, but then it's for every request and I thought it would help if we wouldn't keep retrieving the services on every request. In my case, I can see quite a performance improvement. It's a bit chatty otherwise, you call _consulServiceDiscoveryProvider.Get which then call consul to get the consul clients information on every request.

So I don't want to avoid polling per se, I'm just looking for a solution that is not too chatty...

It's an hybrid implementation then:
Per request until services returned, then like a polling, but trying to avoid imho race conditions induced by the timer.

To summarize:

  • I was like you hoping to have a working polling to reduce calls to consul service
  • I tried your solution but it didn't quite work for me, because the polling is started per template route I presume not per service, so I ended up having the same problems as @3dotsDev (my ocelot file is big).
  • I updated the code to maintain a collection of service discovery providers
  • I have some services that aren't reachable when the Gateway is started. If a call is made (could happen), then I will have to wait until the next timer callback. So, I thought maybe I should try per request until the service is returned.
  • And finally I got cold feet, and I thought: could possibly a thread get _services=null because the services are updated in the timer callback?
    And then I came up with this "incredible" solution ! :-)

@raman-m
Copy link
Member

raman-m commented Sep 18, 2023

@johha86 @matjazbravc @LeeStevens318 @3dotsDev
I believe this bug will be fixed by PR #1670 from @ggnaegi

Could you review the code and/or verify the solution please?

raman-m added a commit that referenced this issue Sep 29, 2023
…ements and fix errors (#1670)

* fixing some issues in poll consul:
- Timer is not thread safe, avoiding usage of it
- No Ressources are returned for first call
- Using a providers pool, instead of creating a new provider instance

* line endings

* adding some test cases

* Using a lock instead of SemaphoreSlim

* Improve code readability

* CA2211: Non-constant fields should not be visible

* Use IOcelotLogger to remove warnings & messages of static code analysis (aka IDE0052)

* Fix errors with unit tests discovery. Remove legacy life hacks of discovering tests on .NET Core

* Update unit tests

* Also refactoring the kubernetes provider factory (like consul and eureka)

* shorten references...

* const before...

* Some minor fixes, using Equals Ordinal ignore case and a string constant for provider type definition instead of string litterals. Fixing usings.

* waiting a bit longer then?

* @RaynaldM code review

* renaming PollKubernetes to PollKube

* ... odd...

* ... very odd, we have an issue with configuration update duration...

* IDE0002: Name can be simplified

* All tests passing locally, hopefully it works online

* just a bit of cleanup

* Some missing braces and commas

* Update servicediscovery.rst: Review and update "Consul" section

---------

Co-authored-by: Guillaume Gnaegi <[email protected]>
Co-authored-by: raman-m <[email protected]>
@raman-m raman-m added the merged Issue has been merged to dev and is waiting for the next release label Sep 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Identified as a potential bug merged Issue has been merged to dev and is waiting for the next release Service Discovery Ocelot feature: Service Discovery
Projects
None yet
7 participants