Posted on Leave a comment

ASP.NET Core Apps Observability

Francisco Beltrao

Francisco

Thank you Sergey Kanzhelev for the support and review of this ASP.NET Core Apps Observability article.

Modern software development practices value quick and continuous updates, following processes that minimize the impact of software failures. As important as identifying bugs early, finding out if changes are improving business value are equally important. These practices can only work when a monitoring solution is in place. This article explores options for adding observability to .NET Core apps. They have been collected based on interactions with customers using .NET Core in different environments. We will be looking into OpenTelemetry and Application Insights SDKs to add observability to a sample distributed application.

Identifying software error and business impact require a monitoring solution with the ability to observe and report how the system and users behave. The collected data must provide the required information to analyze and identify a bad update. Answering questions such as:

  • Are we observing more errors than before?
  • Were there new error types?
  • Did the request duration unexpectedly increase compared to previous versions?
  • Has the throughput (req/sec) decreased?
  • Has the CPU and/or Memory usage increased?
  • Were there changes in our KPIs?
  • Is it selling less than before?
  • Did our visitor count decrease?

The impact of a bad system update can be minimized by combining the monitoring information with progressive deployment strategies. Such as canary, mirroring, rings, blue/green, etc.

Observability is Built on 3 Pillars:

  • Logging: collects information about events happening in the system. Helping the team analyze unexpected application behavior. Searching through the logs of suspect services can provide the necessary hint to identify the problem root cause. Such as: service throwing out of memory exceptions and app configuration not reflecting expected values. As well as calls to external service with incorrect address, calls to external service returns with unexpected results, and incoming requests with unexpected input.

  • Tracing: collects information to create an end-to-end view of how transactions are executed in a distributed system. A trace is like a stack trace spanning multiple applications. Once a problem has been recognized, traces are a good starting point in identifying the source in distributed operations. Like calls from service A to B are taking longer than normal, service payment calls are failing, etc.

  • Metrics: provide a real-time indication of how the system is running. It can be leveraged to build alerts, allowing proactive reactance to unexpected values. As opposed to logs and traces, the amount of data collected using metrics remains constant as the system load increases. Application problems are often first detected through abnormal metric values. Such as CPU usage being higher than before, payment error count spiking, and queued item count keeps growing.

Adding Observability to a .NET Core Application

There are many ways to add observability aspects to an application. Dapr for example, is a runtime to build distributed applications, transparently adding distribute tracing. Another example is through the usage of service meshes in Kubernetes (Istio, Linkerd).

Built-in and transparent tracing are typically covering basic scenarios and answering generic questions, such as observed request duration or CPU trends. Other questions, such as custom KPIs or user behavior, require adding instrumentation to your code.

To illustrate how observability can be added to a .NET Core application we will be using the following asynchronous distributed transaction example:

Sample Observability Application Overview

  1. Main Api receives a request from a “source”.
  2. Main Api enriches the request body with current day, obtained from Time Api.
  3. Main Api enqueues enriched request to a RabbitMQ queue for asynchronous processing.
  4. RabbitMQProcessor dequeues request.
  5. RabbitMQProcessor, as part of the request processing, calls Time Api to get dbtime.
  6. Time Api calls SQL Server to get current time.

To run the sample application locally (including dependencies and observability tools), follow this guide. The article will walkthrough adding each observability pillar (logging, tracing, metrics) into the sample asynchronous distributed transaction.

Note: for information on bootstrapping OpenTelemetry or Application Insights SDK please refer to the documentation: OpenTelemetry and Application Insights.

Logging was redesigned in .NET Core, bringing an integrated and extensible API. Built-in and external logging providers allow the collection of logs in multiple formats and targets. When deciding a logging platform, consider the following features:

  • Centralized: allowing the collection/storage of all system logs in a central location.
  • Structured logging: allows you to add searchable metadata to logs.
  • Searchable: allows searching by multiple criteria (app version, date, category, level, text, metadata, etc.)
  • Configurable: allows changing verbosity without code changes (based on log level and/or scope).
  • Integrated: integrated into tracing, facilitating analysis of traces and logs in the same tool.

The sample application uses the ILogger interface for logging. The snippet below demonstrates an example of structure logging. Which captures events using message template and generates information that is human and machine readable.

var result = await repository.GetTimeFromSqlAsync();
logger.LogInformation("{operation} result is {result}", nameof(repository.GetTimeFromSqlAsync), result);

When using a logging backend that understands structured logs, such as Application Insights, search instances of the example log items where “operation” is equal to “GetTimeForSqlAsync”:

Observability Application Insights structured log search

Tracing collects required information to enable the observation of a transaction as it “walks” through the system. It must be implemented in every service taking part of the transaction to be effective.

.NET Core defines a common way in which traces can be defined through the System.Diagnostics.Activity class. Through the usage of this class, dependency implementations (i.e. HTTP, SQL, Azure, EF Core, StackExchange.Redis, etc.) can create traces in a neutral way, independent of the monitoring tool used.

It is important to notice that those activities will not be available automatically in a monitoring system. Publishing them is responsibility of the monitoring SDK used. Typically, SDKs have built-in collectors to common activities, transferring them to the destination platform automatically.

In the last quarter of 2019 OpenTelemetry was announced, promising to standardize telemetry instrumentation and collection across languages and tools. Before OpenTelemetry (or its predecessors OpenCensus and OpenTracing), adding observability would often mean adding proprietary SDKs (in)directly to the code base.

The OpenTelemetry .NET SDK is currently in alpha. The Azure Monitor Application Insights team is investing in OpenTelemetry as a next step of Azure Monitor SDKs evolution.

Quick Intro on Tracing with OpenTelemetry

In a nutshell, OpenTelemetry collects traces using spans. A span delimits an operation (HTTP request processing, dependency call). It contains start and end time (among other properties). It has a unique identifier (SpanId, 16 characters, 8 bytes) and a trace identifier (TraceId, 32 characters, 16 bytes). The trace identifier is used to correlate all spans for a given transaction. A span can contain children spans (as calls in a stack trace). If you are familiar with Azure Application Insights, the following table might be helpful to understand OpenTelemetry terms:

Application Insights OpenTelemetry
Request, PageView Span with span.kind = server
Dependency Span with span.kind = client
Id of Request and Dependency SpanId
Operation_Id TraceId
Operation_ParentId ParentId

Adding Tracing to a .NET Core Application

As mentioned previously, an SDK is needed in order to collect and publish distributed tracing in a .NET Core application. Application Insights SDK sends traces to its centralized database while OpenTelemetry supports multiple exporters (including Application Insights). When configured to use OpenTelemetry, the sample application sends traces to a Jaeger instance.

In the asynchronous distributed transaction scenario, track the following operations:

HTTP Requests between microservices

HTTP correlation propagation is part of both SDKs. With the only requirement of setting activity id format to W3C at application start:

public static void Main(string[] args)
{ Activity.DefaultIdFormat = ActivityIdFormat.W3C; Activity.ForceDefaultIdFormat = true; // rest is omitted
}

Dependency calls (SQL, RabbitMQ)

Unlike Application Insights SDK, OpenTelemetry (in early alpha) does not yet have support for SQL Server trace collection. A simple way to track dependencies with OpenTelemetry is to wrap the call like the following example:

var span = this.tracer.StartSpan("My external dependency", SpanKind.Client);
try
{ return CallToMyDependency();
}
catch (Exception ex)
{ span.Status = Status.Internal.WithDescription(ex.ToString()); throw;
}
finally
{ span.End();
}

Asynchronous Processing / Queued Items

There is no built-in trace correlation between publishing and processing a RabbitMQ message. Custom code is required, creating the publishing activity (optional) and referencing the parent trace during the item dequeuing.

We covered previously creating traces by wrapping the dependency call. This option allows expressing additional semantic information such as links between spans for batching and other fan-in patterns. Another option is to use System.Diagnostics.Activity, which is a SDK independent way to create traces. This option has limited set of features, however, is built-in into .NET.

These two options work well with each other and .NET team is working on making .NET Activity and OpenTelemetry spans integration better.

Creating an Operation Trace

The snippet below demonstrates how the publish operation trace can be created. It adds the trace information to the enqueued message header, which will later be used to link both operations.

Activity activity = null;
if (diagnosticSource.IsEnabled("Sample.RabbitMQ"))
{ // Generates the Publishing to RabbitMQ trace // Only generated if there is an actual listener activity = new Activity("Publish to RabbitMQ"); diagnosticSource.StartActivity(activity, null);
} // Add current activity identifier to the RabbitMQ message
basicProperties.Headers.Add("traceparent", Activity.Current.Id); channel.BasicPublish(...) if (activity != null)
{ // Signal the end of the activity diagnosticSource.StopActivity(activity, null);
}

A collector, which subscribes to target activities, is required to publish the trace to a backend. Implementing a collector is not a straightforward task and is intended to be used by SDK implementors. The snippet below is taken from the sample application, where a simplified and not production-ready, RabbitMQ collector for OpenTelemetry was implemented:

public class RabbitMQListener : ListenerHandler
{ public override void OnStartActivity(Activity activity, object payload) { var span = this.Tracer.StartSpanFromActivity(activity.OperationName, activity); foreach (var kv in activity.Tags) span.SetAttribute(kv.Key, kv.Value); } public override void OnStopActivity(Activity activity, object payload) { var span = this.Tracer.CurrentSpan; span.End(); if (span is IDisposable disposableSpan) { disposableSpan.Dispose(); } }
} var subscriber = new DiagnosticSourceSubscriber(new RabbitMQListener("Sample.RabbitMQ", tracer), DefaultFilter);
subscriber.Subscribe();

For more information on how to build collectors, please refer to OpenTelemetry/Application Insights built-in collectors as well as this user guide.

Activity

As mentioned, HTTP requests in ASP.NET have built-in activity correlation injected by the framework. That is not the case for the RabbitMQ consumer. In order to continue the distributed transaction, we must create the span referencing the parent trace. This was injected into the message by the publisher. The snippet below uses an extension method to build the activity:

public static Activity ExtractActivity(this BasicDeliverEventArgs source, string name)
{ var activity = new Activity(name ?? Constants.RabbitMQMessageActivityName); if (source.BasicProperties.Headers.TryGetValue("traceparent", out var rawTraceParent) && rawTraceParent is byte[] binRawTraceParent) { activity.SetParentId(Encoding.UTF8.GetString(binRawTraceParent)); } return activity;
}

The activity is then used to create the concrete trace. In OpenTelemetry the code looks like this:

// Note: OpenTelemetry requires the activity to be started
activity.Start();
tracer.StartActiveSpanFromActivity(activity.OperationName, activity, SpanKind.Consumer, out span);

The snippet below creates the telemetry using Application Insights SDK:

// Note: Application Insights will start the activity
var operation = telemetryClient.StartOperation<Dependencytelemetry>(activity);

The usage of activities gives flexibility in terms of SDK used, as it is a neutral way to create traces. Once instrumented the distributed end-to-end transaction in Jaeger looks like this:

Distributed Trace in Jaeger

The same transaction in Application Insights looks like this:

Distributed Trace in Application Insights

When using single monitoring solution for traces and logs, such as Application Insights, the logs become part of the end-to-end transaction:

Observability Application Insights: traces and logs

Metrics

There are common metrics applicable to most applications, like CPU usage, allocated memory, and request time. As well as business specific like visitors, page views, sold items, and sent items. Exposing business metrics in a .NET Core application typically requires using an SDK.

Collection metrics in .NET Core happens through 3rd-party SDKs which aggregate values locally, before sending to a backend. Most libraries have built-in collection for common application metrics. However, business specific metrics need to be built in the application logic, since they are created based on events that occur in the application domain.

In the sample application we are using metric counters for: enqueued items, successfully processed items and unsuccessfully processed items. The implementation in both SDKs is similar, requiring setting up a metric, dimensions and finally, tracking the counter values.

OpenTelemetry supports multiple exporters and we will be using Prometheus exporter. Prometheus combined with Grafana, for visualization and alerting, is a popular choice for open source monitoring. Application Insights supports metrics as any other instrumentation type, requiring no additional SDK or tool.

Defining a metric and tracking values using OpenTelemetry looks like this:

// Create counter
var simpleProcessor = new UngroupedBatcher(exporter, TimeSpan.FromSeconds(5));
var meterFactory = MeterFactory.Create(simpleProcessor);
var meter = meterFactory.GetMeter("Sample App");
var enqueuedCounter = meter.CreateInt64Counter("Enqueued Item"); // Incrementing counter for specific source
var labelSet = new Dictionary<string, string>() { { "Source", source } }; enqueuedCounter.Add(context, 1L, this.meter.GetLabelSet(labelSet));

The visualization with Grafana is illustrated in the image below:

Metrics with Grafana/Prometheus

The snippet below demonstrates how to define a metric and track its values using the Application Insights SDK:

// create counter
var enqueuedCounter = telemetryClient.GetMetric(new MetricIdentifier("Sample App", "Enqueued Item", "Source")); // Incrementing counter for specific source
enqueuedCounter.TrackValue(metricValue, source);

The visualization in Application Insights is illustrated below:

Observability Application Insights custom metrics

Troubleshooting

Now that we have added the 3 observability pillars to a sample application, let’s use them to troubleshoot a scenario where the application is experiencing problems.

The first signals of an application problems are usually detected by anomalies in metrics. The snapshot below illustrates such a scenario, where the number of failed processed items spikes (red line).

Metrics indicating failure

A possible next step is to look for hints in distributed traces. This should help us identify where the problem is happening. In Jaeger, searching with the tag “error=true” filters the results, listing transaction where at least one error happened.

Jaeger traces with error

In Application Insights, we can search for errors in end-to-end transactions by looking in the Failures/Dependencies or Failures/Exceptions.

Search traces with error in Application Insights

Application Insights error details in trace

The problem seems to be related to the Sample.RabbitMQProcessor service. Logs of this service can help us identify the problem. When using Application Insights logging provider, log and traces are correlated, being displayed in the same view:

Observability Application Insights errors and logs

Looking at the details, we discover that the exception InvalidEventNameException is being raised. Since we are logging the message payload, details of the failed message are available in the monitoring tool. It appears the message being processed has the eventName value of “error”, which is causing the exception to be raised.

When introducing observability into a .NET Core application, two decisions need to be taken:

  • The backend(s) where collected data will be stored and analyzed.
  • How instrumentation will be added to the application code.

Depending on your organization, the monitoring tool might already be selected. However, if you do have the chance to make this decision, consider the following:

  • Centralized: having all data in a single place makes it simple to correlate information. For example, logs, distribute traces and CPU usage. If they are split, more effort is required.
  • Manageability: how simple is to manage the monitoring tool? Is it hosted in the same machines/VMs where your application is running? In that case, shared infrastructure unavailability might leave you in the dark. When monitoring is not working, alerts won’t be triggered and metrics won’t be collected.
  • Vendor Locking: if you need to run the same application in different environments (i.e. on premises and cloud), choosing a solution that can run everywhere might be favored.
  • Application Dependencies: parts of your infrastructure or tooling that might require you to use a specific monitoring vendor. For example, Kubernetes scaling and/or progressive deployment based on Prometheus metrics.

Once the monitoring tool has been defined, choosing an SDK is limited to two options. Using the one provided by the monitoring vendor or a library capable of integrating to multiple backends.

Vendor SDKs typically yield little/no surprises regarding stability and functionality. That is the case with Application Insights, for example. It is stable with a rich feature set, including live stream, which is a feature-specific to this specific monitoring system.

OpenTelemetry

Using OpenTelemetry SDK gives you more flexibility, offering integration with multiple monitoring backends. You can even mesh them: a centralized monitoring solution for all collected data, while having a subset sent to Prometheus to fulfill a requirement. If you are unsure whether OpenTelemetry is a good fit for your project, consider the following:

  • When is your project going to production? The SDK is currently in alpha, meaning breaking changes and non-production-ready is expected.
  • Are you using vendor specific features not yet available through the OpenTelemetry SDK (specific collectors, etc.)?
  • Is your monitoring backend supported by the SDK?
  • Are you replacing a vendor SDK with OpenTelemetry? Plan some time to compare both SDKs, OpenTelemetry exporters might have differences compared to how the vendor SDK collects data.

Source code with the sample application can be found in this GitHub Repository.

Francisco Beltrao

Posted on Leave a comment

Redesigning Configuration Refresh for Azure App Configuration

Avatar

Overview

Since its inception, the .NET Core configuration provider for Azure App Configuration has provided the capability to monitor changes and sync them to the configuration within a running application. We recently redesigned this functionality to allow for on-demand refresh of the configuration. The new design paves the way for smarter applications that only refresh the configuration when necessary. As a result, inactive applications no longer have to monitor for configuration changes unnecessarily.
 

Initial design : Timer-based watch

In the initial design, configuration was kept in sync with Azure App Configuration using a watch mechanism which ran on a timer. At the time of initialization of the Azure App Configuration provider, users could specify the configuration settings to be updated and an optional polling interval. In case the polling interval was not specified, a default value of 30 seconds was used.

public static IWebHost BuildWebHost(string[] args)
{ WebHost.CreateDefaultBuilder(args) .ConfigureAppConfiguration((hostingContext, config) => { // Load settings from Azure App Configuration // Set up the provider to listen for changes triggered by a sentinel value var settings = config.Build(); string appConfigurationEndpoint = settings["AzureAppConfigurationEndpoint"]; config.AddAzureAppConfiguration(options => { options.ConnectWithManagedIdentity(appConfigurationEndpoint) .Use(keyFilter: "WebDemo:*") .WatchAndReloadAll(key: "WebDemo:Sentinel", label: LabelFilter.Null); }); settings = config.Build(); }) .UseStartup<Startup>() .Build();
}

For example, in the above code snippet, Azure App Configuration would be pinged every 30 seconds for changes. These calls would be made irrespective of whether the application was active or not. As a result, there would be unnecessary usage of network and CPU resources within inactive applications. Applications needed a way to trigger a refresh of the configuration on demand in order to be able to limit the refreshes to active applications. Then unnecessary checks for changes could be avoided.

This timer-based watch mechanism had the following fundamental design flaws.

  1. It could not be invoked on-demand.
  2. It continued to run in the background even in applications that could be considered inactive.
  3. It promoted constant polling of configuration rather than a more intelligent approach of updating configuration when applications are active or need to ensure freshness.
     

New design : Activity-based refresh

The new refresh mechanism allows users to keep their configuration updated using a middleware to determine activity. As long as the ASP.NET Core web application continues to receive requests, the configuration settings continue to get updated with the configuration store.

The application can be configured to trigger refresh for each request by adding the Azure App Configuration middleware from package Microsoft.Azure.AppConfiguration.AspNetCore in your application’s startup code.

public void Configure(IApplicationBuilder app, IHostingEnvironment env)
{ app.UseAzureAppConfiguration(); app.UseMvc();
}

At the time of initialization of the configuration provider, the user can use the ConfigureRefresh method to register the configuration settings to be updated with an optional cache expiration time. In case the cache expiration time is not specified, a default value of 30 seconds is used.

public static IWebHost BuildWebHost(string[] args)
{ WebHost.CreateDefaultBuilder(args) .ConfigureAppConfiguration((hostingContext, config) => { // Load settings from Azure App Configuration // Set up the provider to listen for changes triggered by a sentinel value var settings = config.Build(); string appConfigurationEndpoint = settings["AzureAppConfigurationEndpoint"]; config.AddAzureAppConfiguration(options => { options.ConnectWithManagedIdentity(appConfigurationEndpoint) .Use(keyFilter: "WebDemo:*") .ConfigureRefresh((refreshOptions) => { // Indicates that all settings should be refreshed when the given key has changed refreshOptions.Register(key: "WebDemo:Sentinel", label: LabelFilter.Null, refreshAll: true); }); }); settings = config.Build(); }) .UseStartup<Startup>() .Build();
}

In order to keep the settings updated and avoid unnecessary calls to the configuration store, an internal cache is used for each setting. Until the cached value of a setting has expired, the refresh operation does not update the value. This happens even when the value has changed in the configuration store.  

Try it now!

For more information about Azure App Configuration, check out the following resources. You can find step-by-step tutorials that would help you get started with dynamic configuration using the new refresh mechanism within minutes. Please let us know what you think by filing issues on GitHub.

Overview: Azure App configuration
Tutorial: Use dynamic configuration in an ASP.NET Core app
Tutorial: Use dynamic configuration in a .NET Core app
Related Blog: Configuring a Server-side Blazor app with Azure App Configuration

Avatar

Software Engineer, Azure App Configuration

Follow    

Posted on Leave a comment

Troubleshooting ASP.NET Core Performance Problems

This is a guest post by Mike Rousos

I recently had an opportunity to help a developer with an ASP.NET Core app that was functionally correct but slow when under a heavy user load. We found a few different factors contributing to the app’s slowdown while investigating, but the majority of the issues were some variation of blocking threads that could have run in a non-blocking way. It was a good reminder for me just how crucial it is to use non-blocking patterns in multi-threaded scenarios like a web app.

Beware of Locks

One of the first problems we noticed (through CPU analysis with PerfView) was that a lot of time was spent in logging code paths. This was confirmed with ad hoc exploration of call stacks in the debugger which showed many threads blocked waiting to acquire a lock. It turns out some common logging code paths in the application were incorrectly flushing Application Insights telemetry. Flushing App Insights requires a global lock and should generally not be done manually during the course of an app’s execution. In this case, though, Application Insights was being flushed at least once per HTTP request and, under load, this became a large bottleneck!

You can see this sort of pattern in the images below from a small repro I made. In this sample, I have an ASP.NET Core 2.0 web API that enables common CRUD operations against an Azure SQL database with Entity Framework Core. Load testing the service running on my laptop (not the best test environment), requests were processed in an average of about 0.27 seconds. After adding a custom ILoggerProvider calling Console.WriteLine inside of a lock, though, the average response time rose to 1.85 seconds – a very noticeable difference for end users. Using PerfView and a debugger, we can see that a lot of time (66% of PerfView’s samples) is spent in the custom logging method and that a lot of worker threads are stuck there (delaying responses) while waiting for their turn with the lock.

Something's up with this logging call

Something’s up with this logging call

Threads waiting on lock acquisition

Threads waiting on lock acquisition

ASP.NET Core’s Console logger used to have some locking like this in versions 1.0 and 1.1, causing it to be slow in high-traffic scenarios, but these issues have been addressed in ASP.NET Core 2.0. It is still a best practice to be mindful of logging in production, though.

For very performance-sensitive scenarios, you can use LoggerMessage to optimize logging even further. LoggerMessage allows defining log messages ahead-of-time so that message templates don’t need to be parsed every time a particular message is logged. More details are available in ourdocumentation, but the basic pattern is that log messages are defined as strongly-typed delegates:

// This delegate logs a particular predefined message
private static readonly Action<ILogger, int, Exception> _retrievedWidgets = LoggerMessage.Define<int>( LogLevel.Information, new EventId(1, nameof(RetrievedWidgets)), "Retrieved {Count} widgets"); // A helper extension method to make it easy to call the 
// LoggerMessage-produced delegate from an ILogger
public static void RetrievedWidgets(this ILogger logger, int count) => _retrievedWidgets(logger, count, null);

Then, that delegate is invoked as needed for high-performance logging:

var widgets = await _dbContext.Widgets.AsNoTracking().ToListAsync();
_logger.RetrievedWidgets(widgets.Length);

Keep Asynchronous Calls Asynchronous

Another issue our investigation uncovered in the slow ASP.NET Core app was similar: calling Task.Wait() or Task.Result on asynchronous calls made from the app’s controllers instead of using await. By making controller actions async and awaiting these sorts of calls, the executing thread is freed to go serve other requests while waiting for the invoked task to complete.

I reproduced this issue in my sample application by replacing async calls in the action methods with synchronous alternatives. At first, this only caused a small slowdown (0.32 second average response instead of 0.27 seconds) because the async methods I was calling in the sample were all pretty quick. To simulate longer async tasks, I updated both the async and synchronous versions of my sample to have a Task.Delay(200) in each controller action (which, of course, I used await with when async and .Wait() with when synchronous). In the async case, average response time went from 0.27s to 0.46s which is more or less what we would expect if each request has an extra pause or 200ms. In the synchronous case, though, the average time went from 0.32 seconds to 1.47 seconds!

The charts below demonstrate where a lot of this slowdown comes from. The green lines in the charts represent requests served per second and the red lines represent user load. In the first chart (which was taken while running the async version of my sample), you can see that as users increase, more requests are being served. In the second chart (corresponding to theTask.Wait() case), on the other hand, there’s a strange pattern of requests per second remaining flat for several minutes after user load increases and only then increasing to keep up. This is because the existing pool of threads serving requests couldn’t keep up with more users (since they were all blocked on Task.Wait() calls) and throughput didn’t improve until more threads were created.

Threads Keeping Up

Asynchronous RPS compared to user load

 

Sync Thread Growth Lag

Synchronous RPS compared to user load

 

Attaching a debugger to both scenarios, I found that 75 managed threads were being used in the async test but 232 were in use in the synchronous test. Even though the synchronous test did eventually add enough threads to handle the incoming requests, calling Task.Result and Task.Wait can cause slowdowns when user load changes. Analyzers (like AsyncFixer) can help to find places where asynchronous alternatives can be used and there are EventSource events that can be used to find blocking calls at runtime, if needed.

Wrap-Up

There were some other perf issues in the application I helped investigate (server GC wasn’t enabled in ASP.NET Core 1.1 templates, for example, something that has been corrected in ASP.NET Core 2.0), but one common theme of the problems we found was around blocking threads unnecessarily. Whether it’s from lock contention or waiting on tasks to finish, it’s important to keep threads unblocked for good performance in ASP.NET Core apps.

If you’d like to dig into your own apps to look for perf trouble areas, check out the Channel9 PerfView tutorials for an overview of how PerfView can help uncover CPU and memory-related perf issues in .NET applications.