Stewart Platt's Blog - Algorithms and data structures primer - Part 3/3

Algorithms and data structures primer - Part 3/3

Introduction

The last part of this primer finishes off with a note about some algorithms which can be applied to some of these data structures. Some

We then close with a few bits of pragmatic info which don't fit anywhere else.

Algorithms

The algorithms we are discussing in these parts fall into two distinct categories. The first are the algorithms required to implement ADTs, some of which have been covered in the previous part. However, we may also want to perform operations on our data structures such as iterating and sorting. This section will go over these common classes of algorithm briefly.

Iteration

A number of the data structures we discussed are trivial to iterate through. Lists and arrays are a simple linear traversal. Obviously for the basic linked list, only forward traversal is possible, but with doubly linked lists forward and reverse iteration is allowed. Circular linked lists can be made traversable once and only once by keeping track of the starting node.

Iterating through graphs

An important properties of graphs, which will largely determine the type of iteration algorithm you use, is whether it is cyclic or acyclic.

Iterating through trees

As discussed in the earlier slides, trees are a specialisation of graphs (specifically, directed acyclic graphs). As such, they can be iterated in a different manner.

Sorting

Sorting algorithms have a number of properties

I'd love to go on about sorting algorithms, but it would be another three parter at least! A nice little site (with graphical examples galore) can be found over at sorting-algorithms.com


Finishing off

Mathematics of ADTs

The previous description of ADTs was not exactly correct (It's one of these 'forget what you were taught before' moments). For practical purposes it works, but ADTs are truly mathematical entities.

Combining data structures

Sometimes there are cases where your usage pattern is very well defined and performance is paramount. In these cases it may be possible to use multiple data structures which link to each other.

For example, a list may be used for iterating in O(N) time complexity, whilst lookup of single items may be performed using a balanced tree.

Deletion would then be performed by proxy, by looking up the item in the balanced tree, yielding the list node reference and deleting both the tree and list nodes.

However, exactly what hybrid approach is used depends heavily on expected usage patterns of operations on the data. As with any implementation detail, this should not be prematurely optimised and should always be benchmarked!

Data mutability

Some implementations of ADTs, whether they be core language data types or library constructs, may have the additional property of immutability. Ultimately, this means you cannot change the value of the constructed object and alterations may only be realised by creating new objects. Examples of this are Haskell variables (by default), Python FrozenSets and a variety of others. Immutability can make it easier to reason about your program's behaviour, but shouldn't affect the ADT's complexity or interface.

A real world analogy

If you're still having trouble seeing how this all fits together, this is the example I used to teach my mother about data structures, algorithms and abstract data types.

Take a dictionary proper as an example. The dictionary is a key/value lookup for words and their respective definitions.

Now think of dictionary.com (in specific, the search functionality in isolation, in practice dictionary.com provides a multitude of different search methods). This also provides a key/value lookup for words and their respective definitions.

These two tools share the same conceptual properties (that is, they have the same Abstract Data Type, which is a Dictionary or Map), but the two are fundamentally different tools and this is down to the underlying data structure and the algorithms enacted upon these data structures.

In the former case, the key/value pairs are stored in an ordered array of fixed size. We cannot add/remove elements easily (we would need to rewrite a portion of the book). The algorithm we learn from an early age to use a dictionary is a slightly different version of a binary chop. We estimate the location of the word using some learned heuristic and then whittle our way down on the exact location of the word. This then maps trivially to a value.

In the latter case, the dictionary is likely a hash table or balanced tree. We look up an exact key and its value is returned in either constant or logarithmic time (best guess!).

You might ask "What advantage does the dictionary proper yield?" Well, if you don't know the exact spelling of a word, you may easily be able to find your target key without resorting to a linear search of the key space! Of course, dictionary.com has its own features which compensate for this.

In closing

There's a lot of mathematics behind this (Abstract Data Types are actually algebraic definitions)

Comments