Category Archives: development

Understanding processing time

I want to give a brief tutorial on understanding CPU processor utilization since it is commonly an area of confusion during performance analysis.

CPU time

CPU time is the percentage of clock ticks than a processor spends waiting for instructions. This is opposed to wall-clock time, which is the total time the whole computer takes to perform an operation. The “load” on a system is the ratio of clock ticks which are performing operations versus the click ticks spent waiting for instructions over a given time period. Thus, load only makes sense as an average over a particular time period. During any single clock tick, the CPU is either processing an instruction or a HLT.

Idle time

The CPU processes a set of instructions fed from memory. The memory contains a set of opcodes (commands) which tell the CPU which operation to perform and the memory where it is located. The rate of clock ticks is constant – several gigahertz in a modern CPU – and during each clock tick, the CPU processes either an instruction to do some work, or a HLT – an opcode which tells the CPU to turn keep components idle until the next cycle.

Monitoring CPU time

So if we were to “observe” a CPU at the clock-tick level of detail, we would see that it’s always either working or waiting. But we can’t actually do that, since the more closely we monitor clock cycles, the more clock cycles are needed to do the monitoring. So to get an accurate picture of activity, we have to step back to a level where the monitoring tool does not interfere too much with the process being observed.

Input/output overhead

Any given computer task has several components: CPU instructions, memory IO, disk IO, network IO, and many others. Only extremely simple tasks (like calculating π) can fit in the small (but very fast) memory buffers on the CPU itself. Real-world tasks will almost always require the CPU to wait for other components to finish shuffling data back and forth in a state where it is ready for the CPU to work on it. But the ideal scenario is for the CPU to be kept as busy as possible (constant 100% load) until the task is completed. This is the minimum possible time in which a task can be completed. If we were to monitor CPU activity during such a task, we’d see the load jump to 100% during the task than back to the baseline when it’s done.  But if the task is shorter than the measurement frequency of the CPU monitoring tool, it would be an average over two periods, or it might not be detected at all.

Multitasking

The situation is more complicated when there are multiple tasks to run, each off which requires some fraction of CPU time. The operating system will run the tasks concurrently. Both the processing time and the IO of tasks will overlap, but because the operating system takes turns running each task, the total CPU load will be some combination which may be higher or lower than the total of the individual tasks – depending on the other competing resources the tasks use. For example, two disk-intensive tasks will use less CPU than the sum of both running individually because the disk IO will be the bottleneck. But two CPU-intensive tasks would more than double total load because the CPU will have to run both tasks and have to handle the context switching between concurrent tasks.

Further reading

  • http://en.wikipedia.org/wiki/CPU_time
  • http://en.wikipedia.org/wiki/Load_(computing)
  • http://en.wikipedia.org/wiki/No_op
  • http://en.wikipedia.org/wiki/Idle_task
  • http://en.wikipedia.org/wiki/Computer_performance

Performance considerations for the Entity Framework execution pipeline

An ORM provides value by doing a lot of things for you – virtualizing databases as native objects and converting types automatically. But the work the ORM does to reduce developer work has a cost too – it has an inherent performance penalty and may encourage some bad development practices.

An ORM by definition has to do some work to convert a database schema into a native view. ORM’s can try to minimize this performance penalty by defining the transformation at design time or caching the database to native object transformation at different points.

Entity Framework is built on top of ADO.Net, which is just an API that does not “know” anything about the database. To convert .Net code into SQL queries and back into CLR objects, Entity Framework performs a set of operations at different stages: (1) compile time (2) first run (3) each execution. To improve performance, EF allows you to shift some work from step 3 to 1 or 2. But the details can be tricky and understanding the best optimization strategy requires understanding the EF query execution pipeline.

The EF execution pipeline

The following information is based on this MSDN EF Performance page.

There are six steps I want comment on in EF execution:

  1. loading metadata
  2. generating views
  3. preparing the query
  4. executing the query
  5. tracking
  6.  materializing objects

1: Loading metadata: The metadata is the mapping defined in your EDMX file. This is a very expensive operation (see the breakdown here), but it only happens once. EF applications use more memory and have a warm-up penalty which should be ignored in performance analysis if you are not concerned with startup times.  Models with with very large (200+) number of entities have a number of problems – see this and this.

2: Generating views: The local query views are static objects which are cached per application domain. This is also an expensive operation but it can be pre-generated and embedded in the application if you care about first run times.

3: Preparing the query: Each unique query must be compiled into the EF version of a stored procedure before it is executed. Microsoft says that the commands are cached for later executions, but I’m confused about what exactly is cached because profiling shows that the query is compiled every. In any case, Microsoft suggest caching the compiled query to avoid this penalty.

Caching precompiled queries is somewhat unwieldy, so it is only advisable in performance-critical contexts, but it makes a big difference for frequently executed queries.

(Note: Take care when using .Count() or Any() to avoid unnecessary query recompilation and avoid unnecessary enumeration when a simple boolean check will do.)

By the way, all the steps above can be skipped by using direct SQL queries or stored procedures with EF. The performance of direct entity SQL falls between pure ADO.Net and Entity Framework queries, so it is only useful as the occasional exception to LINQ queries against a data store.

4: Executing the query: Query execution time depends on the underlying data source. Entity Framework is just a library built on top of ADO.Net, so pure ADO.Net queries are the a benchmark for any ORM’s built on top of it. Because, ADO.Net will always be faster than any framework built on top of it, it can be used as a fallback when other options have been exhausted.

5: Tracking: tracking is used to track changes for updates. If you only need to read data, you can get a small performance boost by disabling it.

6: Materializing objects: each object returned from the database must be converted into a class instance to be used. There is no way to avoid this penalty, but it is worthwhile to keep in mind that the less data there is, the faster it will materialized – not to mention transferred over the wire.

For best performance, queries should be as specific as possible. I have noticed that ORM’s encourage the bad habit of always selecting an entire row. This is because developers using an ORM tend to think of the database as an object repository rather than as a relational store, whereas raw SQL queries encourage manually selecting the needed rows. (Unless one has the awful habit of “select *”.) So to minimize overhead, pull just the data you need (the MVVM pattern is helpful in this regard.)

Final thoughts:

As the ADO.NET program manager himself has pointed out, Entity Framework is inherently slower than ADO.Net SQL queries. But performance should always be balanced against productivity. There are some applications which are definitely not suitable for an ORM and many others that are. Some operations can only be done using raw SQL and can take forever using an ORM (“truncate table”, temp tables, etc).  I think the best approach is to use an ORM where appropriate and optimize in the specific scenarios where performance is inadequate. I do think that Entity Framework is the best, simplest, and the safest choice out of all .Net ORM’s.

These are just a few tips on Entity Framework performance. There are other tips and much more information in the pages below and the links therein.

More reading:

  • MSDN: Performance Considerations (Entity Framework)
  • Entity Framework FAQ: Performance
  • The Code Project: Performance and the Entity Framework
  • Performance Considerations for Entity Framework 5

Rules for good unit tests

  • Test method names should be sentences (This_method_does_this(){})
  •  Test the happy path – the most common/important functionality (acceptance criteria should be executable)
  • Test at the highest level that is practical
  • Unit Tests should not:
  • Talk to the database
  • Talk to the network
  • Touch the file system
  • Don’t change business logic to write the code
  • Depend on any other tests (can be run at any time, in any order)
  • Depend on environment variables (USE MOCKS!)
  • Tests should be fast (lengthy tests are doing something wrong)
  • Less than 10 lines of code
  • Only one or two logical asserts per test
  • Don’t write tests after development is done 

Further reading:

  • Introducing BDD
  • A Set of Unit Testing Rules

 

Reading Excel files in .Net

This should work for most Excel versions including both xls and xslx:

 
const string ExcelConnString =
                @"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=""Excel 12.0 Xml;HDR=YES"";";
 
            var adapter = new OleDbDataAdapter("SELECT * FROM [Sheet1$]", String.Format(ExcelConnString, physicalPath));
            var ds = new DataSet();
 
            adapter.Fill(ds, "anyNameHere");
            var data = ds.Tables["anyNameHere"].AsEnumerable();
 
            EnumerableRowCollection tags =
                data.Where(x => x.Field("tag") != string.Empty).Select(x =>
                                                                                 new Tag()
                                                                                     {
                                                                                         Description = x.Field("Description")
                                                                                     });

Basic AJAX with MVC3 & Razor

Suppose you have a page which requires you to load content inline in response to some action.  In the below example, you have a filter and a “preview” button which needs to show some sample data.

You could have the button do a post and have the MVC handler parse the form fields and render the view with the data you want to show.  But, you may have a complex page with many tabs and want to optimize form performance.    In such a case, you want to send just the form data for the current tab and inject the response into the current page.  Here is how to do that:

Here is the HTML of the relevant section of the “Export” tab:

<fieldset>        
 
<div class="btn green preview">
<a href="#">Preview</a></div>
<div class="clear"></div>
 
<span>&nbsp;</span>    
</fieldset>
 
<div class="btn orange submit">
<a href="#">Export to CSV</a></div>
<div class="clear"></div>

Note the two href’s with attributes of btn.  Here is the relevant JavaScript:

$(document).ready(function () {
 
    $('.preview').click(function () {
        $('#preview').html('<strong>Loading...</strong>');
        $.post('Home/Preview/?' + $(this).closest("form").serialize(), function (data) {
            $('#preview').html(data);
        });
    });
 
    $('.submit').click(function () {
        $('.submit a').html("Loading...");
        $(this).closest("form").submit();
        return false;
    });
 
});

The “Export” button submits the entire form.  The “Preview” button on the other hand:

  1. Shows the “loading” text.
  2. Serializes the content of the parent form
  3. Posts the serialized content to a url
  4. Renders the response from that URL in the designated div

Here is Preview.cshtml:

@{
    Layout =null;
}
@model  Models.ExportFilter
 
<h2>@ViewBag.Message</h2>
 
<strong>Name, Source, Email, DateAdded</strong>
<ul>
    @foreach (var item in Model.MarketingList)
    {
        <li>@item.FirstName, @item.Source, @item.Email, @item.DateAdded.ToShortDateString()</li>
    }&nbsp;
</ul>&nbsp;

Note that I am overriding the default Layout because I don’t want to show the normal header/footer. _Blank.cshtml contains solely [email protected] ()”

The handler for the /Preview target is:

[HttpPost]
        public ActionResult Preview(ExportFilter filter)
        {
            var export = new EmailExporter(filter);
            List emailList = export.Process();
            ViewBag.Message = string.Format("{0:##,#} records to be exported", emailList.Count); 
            return View(filter); 
        }

Now when I click “preview”, I get a momentary “loading” screen and then the rendered view.

Customizing Work Items in TFS

A quick guide to customizing work items in Team Foundation Server:

1: Install Team Foundation Server Power Tools.

2:  In Visual Studio, connect to your TFS collection and open the Work Item Template definition on the server:

3: Select the work item template you want to edit:

4: Today, I added a “Pass/Fail” field to test case.  Click “New” to add a new field.

5: You can specify a number of field types such as text, number, dropdown, etc.  For a dropdown, specify a list of allowed values in the “Rules” tab

 

6: After you add a new field, you need to add it to the layout:

7:  You can also define rules for workflow transitions that depend on your new field:

10 simple questions to evaluate a software development organization

All credit and inspiration for the questions below goes to Joel Spolsky’s The Joel Test and Bill Tudor’s 2010 update.

My version differs in two aspects:  First, has just ten questions ordered from highest to lowest priority (in my personal opinion).  Second, in addition to yes/no questions, it asks open-ended questions intended to give more informative answers in an interview or an analysis of a software team.

  1. Do you use source control? What kind?  What are the requirements/checks to check in code (assignment to work items, unit tests, peer review, etc)?
  2. Do you use a bug database to track all issues? How do you track progress and manage change?
  3. Do you use the best tools money can buy? For example: MSDN/Apple Dev accounts, dual monitors, and powerful workstations.
  4. Do you have a dedicated QA team? Are they involved in the requirement/release management process?
  5. Do you fix bugs and write new code at the same time? How do you balance the two?
  6. Do programmers have quiet working conditions and team meeting rooms?  Describe them.
  7. Do you solicit feedback from end users or customers during the development process?  How is it used?
  8. Do you do a daily build? Do your builds include automated unit tests?
  9. Do you have a requirements management system? Is it integrated with your source control?
  10. Do you create specification/requirements documents? Do you do it before, during, or after writing code?

Bad developer ≠ novice developer

My post on the nature of programming seems to have struck a nerve. Many commenters pondered what makes a developer great. “Ka” thought that:

“You [are] not born [a] good or great programmer, you become one with time and study and hard work. At the beginning, everybody is a bad programmer.”

I disagree. Developers are not born “great,” but greatness does not automatically come with experience. Conversely, lack of experience does not make a developer “bad.” The difference between a great developer and a bad developer is not in their domain knowledge, but their methodology. The distinguishing mark of a great developer is that he codes consciously. Put another way, a good developer always knows why he is doing something. From the perspective of personal ethics, this requires intellectual courage and integrity.

Let me give an illustration of what I mean from personal experience:

When I got into Objective-C development, I was a “bad” developer. Most of my experience is with .Net code. Jumping into the iPhone dev world was intimidating. As as a result, I lacked the courage to learn the architecture. I tried to manipulate blocks of code found on the web without understanding what they were doing. I would copy and paste blocks of code and just change variable names. When things didn’t work, I would look for another block of code to substitute for the failing one, or enter “debugging hell” – running code over and over, making random changes and seeing if they worked. This is the hallmark of a bad developer – imitating without understanding. I kept this up for over a year. It’s not that I didn’t try to learn the language. I got several books and watched iTunes U classes. But the way I used the learning materials was to memorize blocks of code and look for places to stuff them into my code. I wasn’t actually learning the platform, just collecting samples. Some developers spend their entire careers this way. They carry collections of old code everywhere they go, and just grab chunks to insert into new programs. They may never select File => New File or File => New Project in their whole career.

After writing a lot of bad, buggy code, I drifted back to the comfort of .Net. Recently however, I decided to change my attitude. I started by downloading some iPhone code samples and open-source applications. I started in main.m and went through each line of code. If I didn’t understand exactly why a certain symbol was used or what it did, I looked it up. I spent a lot of time on Cocoa Dev Central, Developer.Apple.com, and Stack Overflow looking up things like the reasons why you would assign, retain, or copy a property, or when exactly you need to release an object (for alloc, new, copy or retain) or what you can do with respondsToSelector. There’s really not that much complexity to programming languages – but if you don’t take the time to learn how things work, they will always seem difficult and mysterious. If I had just looked this stuff up to begin with, I would have been far more productive. But, I was intimidated by the environment and tried to shortcut the learning process by imitating without understanding.

Understanding anything complex requires the courage and integrity to engage in difficult, exhausting mental effort. It’s tempting to cheat yourself. It’s easier spend more time in endless copying and debugging that take the effort to understand and create. In the short run, it saves time. But in the long run, developers who understand their craft are magnitudes more productive than the monkey see-monkey do coders. This is the difference between the unprincipled kind of laziness that trades understanding for time and the principled kind of laziness which saves time by understanding.

There’s no happy ending to my story — yet. The proof of a developer is in his work, not his book smarts, and I have yet to produce something to brag about.

For more on the traits of great developers, read these posts by Dave Child and micahel.