Demystifying Data Structures

I was enjoying a beautiful spring day in Florida after a hectic week when my phone rang. It was a much awaited Saturday but as I guessed the call was from office. So much for my beautiful Saturday.

I had completed a production deployment the day before and users were complaining about the poor performance of the system. Now this is serious.

Reluctantly, I took my laptop, rolled up by sleeves and got down to investigate. After few hours I found the culprit – 4 lines of bad code. A variable was being set inside a ‘for loop’ and this involved a call to a remote db server. Yeah, this is too naive but rarely at least some of us do write such piece of code. Most times, we have too little time to think and we opt for quick wins with brute force approaches. We pay less or no thought to CPU cycles and memory which in most cases are taken for granted. The result is a working software with poor performance over time.

One must be extremely careful while using CPU and memory resources. Unknowingly we are paying a price for these resources. The more CPU cycles we use to perform an operation, the longer some other process has to wait. Every operation consumes a number of CPU cycles and memory which could have been used for more useful work.

How do we achieve it?

This can be achieved by many ways. Design or architecture of applications using right technology plays a major role. But however good the architecture is, the implementation has its stake. Keep in mind each line of code execution is having a cost associated with it.

Sometimes we keep on calculating some values which does not vary throughout the code. In most of the cases we could have declared an instance variable or a global variable which should have be initialized once and used multiple times instead of recalculating every time. Trust me this is an example of common carelessness programmers do.

First and foremost we should understand the platform on which we are writing code to take maximum advantages offered by the platform for memory management and optimal execution of code. Every environment like JVM, CLR etc. has its own memory management, garbage collection techniques, so knowing them is essential to write optimal piece of code. Knowledge about how the data is represented in memory would also help us in managing the data movements between call stacks. For example it help us to decide if we have to pass value between functions as value or reference where second one passes only copies reference instead of huge class object.

Use of right data structures is another key factor deciding the complexity or cost associated with the operation you are performing. For example if we are using List instead of LinkedList we get different cost for different operations. List is array backed and we can access elements by index. The complexity or cost associated will be same irrespective to the size of List as elements are accessed by index but accessing elements by index will be costlier in LinkedList as it is a doubly linked list. However addition or removal of elements in between is faster in LinkedList.  Another example will be the use of HashTables for search operations where the complexity is constant irrespective of the size of the table.

Algorithms and logic used in the program also determine the performance of program. For example while sorting a list, quick sort or merge sort could be used instead of bubble sort. In the example below strlen(s) is called when a ‘for loop’ is executed every time.

For (i=0; i<strlen(s); ++i) {
  If (… s[i] …) …
}

If we could rewrite to

n=strlen(s);
For (i=0; i<n; ++i) {
  If (… s[i] …)…
}
then many CPU cycles will be saved by above simple operation.

A good programmer should also know when to use an abominable algorithm like bubble sort. For example, if the problem domain dictates there can never be more than five items (like the number of dice in a Yahtzee game), you know that you always have to sort at most five items. In that case, bubble sort might actually be the most efficient way to sort the items.

Demystifying Data Structures was last modified: by Admin_REFL

About the Author

Jinu Raj
Senior Software Engineer

Specialist, focused on design patterns, Software efficiency, performance improvements, Coding practices and Algorithm Efficiency. Expert in C#, core component development and integration of heterogeneous systems.

Leave a Reply