Designing fast software


In today's short-attention-span world its important to make sure that your software product meets certain minimum speed requirements. Not doing so causes your target user to go somewhere else looking for someone who can provide the same service you offer faster and/or more conveniently. If they can find one, you've lost a potential user and your product might as well not exist when it comes to that user. Let's use web browsers as a case study. In 2009, one year after the Chrome browser launched, it had 7.1% usage compared to Firefox's 46.6% performance. Fast forward to today where Chrome's usage is 61.6% compared to Firefox's 23.6%. How did Chrome get such a leg up on FF? While being advertised on the main Google website certainly helped, their calling card was speed. Bottom line; performance matters!

If performance matters so much, we should certainly pay attention to it in the design phase of our software projects and not just leave it as an after thought. This goes against the popular notion that you should only worry about the functionality of your application and leave performance tuning until when it is needed. While it is true that premature optimization is the root of many evils, no amount of post-design performance optimization can make up for poorly designed software. Your performance requirement will differ based on user expectation, business goals and of course the work that is actually getting done. The performance expectations for pulling and presenting a record from a data store is different from performance requirements for crunching through lots of data for a machine learning problem.

Ultimately performance comes down to CPU cycles. One core can only perform one computation within one cycle. Which then implies that to speed up your software task you can; use faster hardware that can perform more cycles per second, parallelize your job so that multiple cores are working on it at the same time, or reduce the number of cycles it takes to complete your job. This post focuses on the software solutions, however, giving granular advice on how write performant software is not something that can be done in this article. What I can do is list some reasons why systems are slow and design considerations to keep in mind in order to write fast software.

Excessive memory allocation / usage:
This is especially important in managed languages like Java and C# where garbage collection is handled by the runtime. While the process of memory allocation is almost free compared to C/C++, the more memory that is allocated means that garbage collection runs longer and more frequently. In addition to that, programs that frequently need to access memory and secondary storage end up being I/O bound. CPU bound processes will always be faster than I/O bound processes because the rate at which data is transferred from main memory or secondary storage to the CPU is typically slower than the rate at which the CPU can execute instructions.

Inefficient algorithms:
Its fairly obvious that the efficiency / inefficiency of your algorithm is directly related to your software's performance. Algorithms with quadratic complexity or higher fail to scale and there's always a performance benefit in the reduction of the complexity of your algorithm. Some techniques for improving the performance of your algorithm are:
1. using a hybrid or adaptive algorithm (read links for more info),
2. avoiding unnecessary work by; caching the result of an expensive function call and returning the cached result when the same inputs occur again, providing a fast path for common cases, etc.
 It's also sometimes necessary to consider the performance of third party or framework libraries / functions and switch to a library or function with a faster algorithm if available.

Too many layers of abstraction:
While layers of abstraction are great for encapsulation, information and complexity hiding, re-use, etc. Abstractions typically add overhead. Sometimes the overhead is trivial and other times it is significant. Therefore it becomes a question of trade-offs. To gain increased performance it may be necessary to lose a layer of abstraction and along with it, both the abstraction layer's overhead as well as whatever benefits it may provide.

Bring the data closer to the computation (or vice versa):
This is closely related to my first point but takes more of a macro view. The closer the data that is being processed is, the faster the process can run. So if the data is on disk and can be brought and managed in memory effectively (maybe in chunks) we can increase performance. Another example is bulk downloading remote data and operating on it locally instead of making several individual remote calls. Hadoop solves the problem of scaling big data problems by distributing the data and sending the computation as close to the data as possible or a machine, rack or cluster level.

Task parallelization opportunities:
This one is an obvious one. While parallelization can be difficult to do properly, it can certainly yield great results in the right situations. In addition to running multiple tasks in parallel on multiple threads, look for opportunities in which instead of waiting for a long running request such as a web service or database call to return a response, your process can make the call, continue doing other things and resume its original work when the call returns. An implementation of this is the async and await keywords in C#.

Avoid / minimize bottlenecks:
The benefits of multiple threads or processes can be limited when there is a bottle neck. A bottle neck is typically a resource that cannot be accessed or modified in parallel in order to maintain its integrity. There are cases where bottlenecks can be avoided (by not sharing data / resources), but sometimes we can't get around them. In those situations the solution is to hold on to a resource lock for as little time as possible and keep the operations involving the lock as simple as possible. Bottlenecks are also prevalent in horizontal scaling situations, where the database cannot scale like the rest of the infrastructure. Database sharding is a common approach to solving that problem.

In conclusion, there are many more performance factors that this article could have covered and I only went skin deep. In future posts I'll be taking a deeper dive into these and other factors. Feel free to hit up the comments section with questions or comments.


Wikipedia has a good article on program optimization here:

You Might Also Like