Performance considerations for the Entity Framework execution pipeline

Posted by & filed under development.

An ORM provides value by doing a lot of things for you – virtualizing databases as native objects and converting types automatically. But the work the ORM does to reduce developer work has a cost too – it has an inherent performance penalty and may encourage some bad development practices.

An ORM by definition has to do some work to convert a database schema into a native view. ORM’s can try to minimize this performance penalty by defining the transformation at design time or caching the database to native object transformation at different points.

Entity Framework is built on top of ADO.Net, which is just an API that does not “know” anything about the database. To convert .Net code into SQL queries and back into CLR objects, Entity Framework performs a set of operations at different stages: (1) compile time (2) first run (3) each execution. To improve performance, EF allows you to shift some work from step 3 to 1 or 2. But the details can be tricky and understanding the best optimization strategy requires understanding the EF query execution pipeline.

The EF execution pipeline

The following information is based on this MSDN EF Performance page.

There are six steps I want comment on in EF execution:

  1. loading metadata
  2. generating views
  3. preparing the query
  4. executing the query
  5. tracking
  6.  materializing objects

1: Loading metadata: The metadata is the mapping defined in your EDMX file. This is a very expensive operation (see the breakdown here), but it only happens once. EF applications use more memory and have a warm-up penalty which should be ignored in performance analysis if you are not concerned with startup times.  Models with with very large (200+) number of entities have a number of problems – see this and this.

2: Generating views: The local query views are static objects which are cached per application domain. This is also an expensive operation but it can be pre-generated and embedded in the application if you care about first run times.

3: Preparing the query: Each unique query must be compiled into the EF version of a stored procedure before it is executed. Microsoft says that the commands are cached for later executions, but I’m confused about what exactly is cached because profiling shows that the query is compiled every. In any case, Microsoft suggest caching the compiled query to avoid this penalty.

Caching precompiled queries is somewhat unwieldy, so it is only advisable in performance-critical contexts, but it makes a big difference for frequently executed queries.

(Note: Take care when using .Count() or Any() to avoid unnecessary query recompilation and avoid unnecessary enumeration when a simple boolean check will do.)

By the way, all the steps above can be skipped by using direct SQL queries or stored procedures with EF. The performance of direct entity SQL falls between pure ADO.Net and Entity Framework queries, so it is only useful as the occasional exception to LINQ queries against a data store.

4: Executing the query: Query execution time depends on the underlying data source. Entity Framework is just a library built on top of ADO.Net, so pure ADO.Net queries are the a benchmark for any ORM’s built on top of it. Because, ADO.Net will always be faster than any framework built on top of it, it can be used as a fallback when other options have been exhausted.

5: Tracking: tracking is used to track changes for updates. If you only need to read data, you can get a small performance boost by disabling it.

6: Materializing objects: each object returned from the database must be converted into a class instance to be used. There is no way to avoid this penalty, but it is worthwhile to keep in mind that the less data there is, the faster it will materialized – not to mention transferred over the wire.

For best performance, queries should be as specific as possible. I have noticed that ORM’s encourage the bad habit of always selecting an entire row. This is because developers using an ORM tend to think of the database as an object repository rather than as a relational store, whereas raw SQL queries encourage manually selecting the needed rows. (Unless one has the awful habit of “select *”.) So to minimize overhead, pull just the data you need (the MVVM pattern is helpful in this regard.)

Final thoughts:

As the ADO.NET program manager himself has pointed out, Entity Framework is inherently slower than ADO.Net SQL queries. But performance should always be balanced against productivity. There are some applications which are definitely not suitable for an ORM and many others that are. Some operations can only be done using raw SQL and can take forever using an ORM (“truncate table”, temp tables, etc).  I think the best approach is to use an ORM where appropriate and optimize in the specific scenarios where performance is inadequate. I do think that Entity Framework is the best, simplest, and the safest choice out of all .Net ORM’s.

These are just a few tips on Entity Framework performance. There are other tips and much more information in the pages below and the links therein.

More reading:

One Response to “Performance considerations for the Entity Framework execution pipeline”

Leave a Reply