Which Data Structure to Use? Data Structure Selection in Programming !



I want to know if you have seen a particular problem and used a particular data structure, and why? I usually end up using just arrays, lists, or hash maps for most of day to day work. If there is a particular algorithm needs to be implemented then I try to think of trees or linked list. Just wondering how do you decide which one is better? What are the tradeoffs?
–Posted at Google Groups


Data structures should be matched to the problem to be accomplished. What kinds of problems are we trying to solve? Many, perhaps most, problems can be solved with just a few basic data structures. Arrays and hashes are some of the most important of these.

As a general rule, the data structure will affect the runtime of your algorithm, but not the result.

So you can do everything with arrays. The problem is that, if the algorithm involves appending a lot of data to an array, you might have to grow it each time. This can involve reallocations and copies, alternatively making every array huge. On a small problem and a fast machine this might not matter, but otherwise programmers would probably look to linked lists.

The universal properties of data structures are the amount of memory used in storing the contents, and the time and additional memory each operation takes. You come to know those for some important kinds of data structures and look for a fit with the requirements on footprint or responsiveness.

One resulting tradeoff is the usual time vs. space — e.g., hash tables provide fast search but need many more slots than entries. Also, a given storage method may permit several algorithms for some operation that vary along this dimension.

The correct answer is of course that you should pick the data structure depending on the problem. In general, (in complex applications), you can always change the data structures dynamically to prepare the data for a given algorithm.

For any problem that I can think of, the data structures recommend themselves, and in a very obvious manner. It’s important to know what the different data structures do. You don’t even have to know how they do it (but you do have to understand O(f(n)) notation or you won’t know which one is better).

Cost-benefit analysis: What features does the data structure provide, what are costs (time, space, etc.) to using them, and how well does this combination suit the particular algorithm I have in mind?

  • I’d say that at least 90% of the data structures I use are simple arrays. The rest are mainly linked lists or binary trees. Very rarely is it worth using a hash or a red black tree.

  • If you need to find something with equality only, then hash.

  • If you need to find a range of something (e.g. where ‘x’ between ‘a’ and ‘b’ or where ‘y’ >= ‘z’) then use a tree or skiplist. Probably tree for reading data from files and storing them for particular arrangement e.g. sorted.

  • If you need to order a list by some kind of importance and modify the list on the fly, then a priority queue is wanted.

  • List is not ordered (unless you sort it, and then the sort order is not maintained until the next sort operation). So lists for random insertion or deletion within bunch of records.

  • I use double linked list for any scrollable cursor objects I want to build.

  • If you need to solve problems involving some sort of web like structure, then a graph is often wanted.

  • If you want to access data by content rather than by id number, there’s nothing like a hash table.

  • If you want to keep data sorted, despite arbitrary inserts and deletes, you are looking at a red black tree.

  • Another thing to think about: Hybrid data structures. Combine several data structures to create a data structure with the best properties of all of its atoms and the worst properties of none of them.

I think that anyone can memorize a list of (for example) 7 things pretty well. If you can memorize a list of 7 (or whatever) fundamental data structures and what they are used for then you will be able to choose them effectively when they are called for.

Sometimes, your choices won’t be optimal (e.g. you used a heap when you could have used a weak-heap). But the difference will probably be small enough that it won’t matter much anyway. However I program mainly in C. In Perl hashes are built in, so of course it is easy to use them even where, strictly, you don’t need constant access time.

Intertwined but in a slightly different vein, another tradeoff is among the operations on a given ADT (e.g. Union-Find) — different underlying storage methods might make one operation or another fast but not both at once! If you know something about the corresponding object’s usage pattern, look into that data structure which makes the pattern fast. But of course, this choice generally affects total storage used as well.

If (on the other hand) you use an unordered array when you should have used a hash table and there are millions of objects in it, the project will fail (until/unless someone repairs it).

Undergrad computer science courses do talk about simple data structures, but they don’t really do a good job of explaining that those data structures are what you should use most of the time, and they tend to spend a good deal of time discussing more advanced data structures (e.g. red-black trees, B-trees) that usually aren’t what you want.

Find a copy of “C++ Data Structures” / “Java Data Structures “etc book and try reading. Couldn’t hurt. Also, see “Effective STL” for recommendations on how to choose standard containers.

Algorithms are usually written around some kind of collection, and different collection can fit the bill. The performance requirements are usually the deciding factor when more than one data structure is capable of just doing the grunt work. See the “Effective STL” and other books in its bibliography.

Which Data Structure to Use? Type of Data Structure Selection in Programming !

tzury bar yochay said,

September 8, 2007 @ 11:26 pm

regarding your comment:
I just entered into Python world. I speak Perl. Once I grab up enough syntax, I’ll look at the framework. Thanks

I think you should take a look at web.py — http://webpy.org You are most likely to fall in love with it.

Tzury Bar Yochay tzury dot by at gmail dot com

Computer Science Data Graphics Hardware said,

February 15, 2008 @ 8:06 pm

Computer Consulting Services: Protecting Client Data

As a provider of computer consulting services, you should communicate to your clients that a dedicated server generally can enforce strong user passwords. This means your clients can:

importance using selection structures in programming said,

June 10, 2008 @ 1:34 am

[…] hash maps for most of day to … It??s important to know what the different data structures do. …http://www.sharpprogrammer.com/data-structures/which-data-structure-to-use-data-structure-selection-…Course designatorCIS 330 programming Language II 3 Credit Hours … information, creating and using […]

RSS feed for comments on this post · TrackBack URI


Leave a Comment

You must be logged in to post a comment.