Intermittent 503.64 response from Azure App Services

pavangcc

pavangcc

We have several APIs running as Azure App Services and have been experiencing intermittent service outages where 10-25% of the calls to those App Services return a 503.64 while the other calls (often to the same endpoint) return normal responses. None of the 503 failing requests appear in App Insights or in the App Service's web server logs, and we can only see them through tracing and logging from the calling services (App Gateway logging or other App Services making internal REST calls).

We haven't been able to determine a pattern to when this problem occurs. Between disruptions, we have seen as little as an hour and as much as 2-3 days. The issue has lasted anywhere from 2-30 minutes at a time. There is no correlation with traffic or system load, and it happens at any time of day or night. We have seen this on both I1 and I2 tier plans. We have tried scaling out our services well beyond necessary capacity, but the problem still occurred.

Looking at the 64 sub-status code, it suggests a rewrite issue.

<!-- Antares Rewrite provider codes-->
<error id ="64" description="Exception in rewrite provider (probably SQL)" />

However, I'm confused by the fact that some of the rewritten calls succeed during the same period that others fail.

For completeness, here is the rewrite for one of the failing services. A public API endpoint is being rewritten to an internal request address, and all other APIs are to remain as-is and pass through to the Controller:

<rewrite>
<rules>
<rule name="LegacyGetStatus">
<match url="Mapping/fw/1.0/Status(.*)" />
<action type="Rewrite" url="api/Status{R:1}" />
</rule>
</rules>
</rewrite>
We submitted a support ticket a week ago, and in the meantime we are trying to figure out what could be causing this and what we can do to mitigate the issue.