Algorithms 101: Basic Data Structures Pt. 1

12 Aug 2015

This article is intended to cover some basic data structures: arrays, linked lists, stacks and queues; their applications and implementations in Java. Source code of implementations used in this article is available in git repository.

Abstract and concrete data structures

Abstract data structure is a mathematical model, an interface declaring desired behaviour. Concrete data structure is an implementation of such interface. For example, collection or container is an abstract data structure. It says only that some items of particular type are stored inside it and declares methods for accessing and updating them. Array and list, on the other side, are concrete data structures. They describe the way items are stored (in continuous memory chunk versus as a chain of pointers) and implement methods declared in collection's interface.

Collections

Collection (Container) is an abstract data structure. It holds several items of some specified type T and offers methods for accessing and updating them. The following interface will be used throughout this article to describe containers:

package com.dancingrobot84.algorithms.datastructures;

public interface Collection<T> {
    /** Insert item at the beginning of collection **/
    void append(T item);

    /** Insert item at the end of collection **/
    void prepend(T item);

    /** Insert item at specified place of collection **/
    void insert(int index, T item);

    /** Retrieve item at specified place **/
    T get(int index);

    /** Return the size of collection **/
    int size();

    /** Remove item at specified place from collection **/
    void remove(int index);
}

Arrays

Array is a container that holds a number of items in contiguous area of memory. In its simplest form, static array, the capacity of a container is set up at the moment of its creation and may not be changed. Dynamic arrays, however, can grow automatically when current capacity is exceeded.

Arrays are used both on its own and as an underlying data structure for more sophisticated structures, e.g. heaps, associative arrays and adjacency matrices.

Arrays are supported out of the box in overwhelming majority of modern programming languages. Usually static arrays present as a syntactic feature. Here is an example of creating static array of 10 ints in Java:

int[] arrayOfInts = new int[10];

Dynamic arrays often shipped as utility classes of standard library (std::vector in C++ and ArrayList in Java).

Dynamic array implementation

Idea: let's start with a small static array. If there is no free space left to insert new item, then allocate new static array with extended capacity and copy all items from the old array into the new one. It is proven that allocating new array twice as large as the old one gives you O(1) amortized complexity of inserting new item at the end of an array.

Here is a simple implementation of dynamic array in Java:

package com.dancingrobot84.algorithms.datastructures.impl;

import java.util.Arrays;
import com.dancingrobot84.algorithms.datastructures.Collection;

public class DynamicArray<T> implements Collection<T> {

    public static <T> Collection<T> empty() {
        return new DynamicArray<T>();
    }

    @Override
    public void append(T item) {
        ensureCapacity(size + 1);
        items[size] = item;
        size += 1;
    }

    @Override
    public void prepend(T item) {
        ensureCapacity(size + 1);
        System.arraycopy(items, 0, items, 1, size);
        items[0] = item;
        size += 1;
    }

    @Override
    public void insert(int index, T item) {
        assert index >= 0 : "Index is out of range: " + Integer.toString(index);
        size = Math.max(index + 1, size + 1);
        ensureCapacity(size);
        System.arraycopy(items, index, items, index+1, size-index-1);
        items[index] = item;
    }

    @Override
    public void remove(int index) {
        checkRange(index);
        System.arraycopy(items, index+1, items, index, size-index-1);
        size -= 1;
    }

    @Override
    @SuppressWarnings("unchecked")
    public T get(int index) {
        checkRange(index);
        return (T)items[index];
    }

    public void set(int index, T item) {
        checkRange(index);
        items[index] = item;
    }

    @Override
    public int size() {
        return size;
    }

    private int size = 0;
    private Object[] items = new Object[1];

    private DynamicArray() {
        // Disable subclassing
    }

    private void ensureCapacity(int capacity) {
        if (items.length < capacity) {
            int enlargeTo = Math.max(2 * items.length, capacity);
            items = Arrays.copyOf(items, enlargeTo);
        }
    }

    private void checkRange(int index) {
        assert index >= 0 && index < size : "Index is out of range: " + Integer.toString(index);
    }
}

Advanced implementation can shrink array when the number of items drops below some point, say, half of capacity. However, it should be implemented with cautious, because too eager behaviour can cause drastic change of complexity. Imagine an array which shrinks itself in half as soon as the number of items drops below half of capacity; when there are n/2 items and you remove one, then insert one and repeat these actions m times the overall complexity will be O(mn). It's better to shrink array by lesser amount, a third or a fourth of capacity for example.

Linked lists

Linked list is also a container. It stores its items in nodes. Each node consists of at least two fields: a value to store and a pointer to another node. Nodes are chained one to another and, thus, can be located in arbitrary places of memory (contrary to arrays). Linked lists can be divided into several groups according to the number of pointers in its nodes: singly-linked, doubly-linked and multiply-linked.

Linked lists can be used either on its own or as an underlying data structure for such ADTs as adjacency lists, stacks, queues and for collision resolution in hash tables.

Linked lists are present in many programming languages, for example, std::list in C++ and LinkedList in Java. Usually they are implemented as doubly-linked lists.

Doubly-linked list implementation

Idea: key idea of this implementation is to use head and tail sentinel nodes. They are constant for particular linked list object; they point to each other when the list is empty and to the first/last node otherwise. Head sentinel's prev is always null, just as tail sentinel's next. Main purpose of such sentinel nodes is to save us time and energy playing around null checks: this way we can be sure that list always has head and tail nodes, though they're merely decorative.

Here is a simple implementation of doubly-linked list in Java:

package com.dancingrobot84.algorithms.datastructures.impl;

import java.util.Arrays;
import com.dancingrobot84.algorithms.datastructures.Collection;

public class DynamicArray<T> implements Collection<T> {

    public static <T> Collection<T> empty() {
        return new DynamicArray<T>();
    }

    @Override
    public void append(T item) {
        ensureCapacity(size + 1);
        items[size] = item;
        size += 1;
    }

    @Override
    public void prepend(T item) {
        ensureCapacity(size + 1);
        System.arraycopy(items, 0, items, 1, size);
        items[0] = item;
        size += 1;
    }

    @Override
    public void insert(int index, T item) {
        assert index >= 0 : "Index is out of range: " + Integer.toString(index);
        size = Math.max(index + 1, size + 1);
        ensureCapacity(size);
        System.arraycopy(items, index, items, index+1, size-index-1);
        items[index] = item;
    }

    @Override
    public void remove(int index) {
        checkRange(index);
        System.arraycopy(items, index+1, items, index, size-index-1);
        size -= 1;
    }

    @Override
    @SuppressWarnings("unchecked")
    public T get(int index) {
        checkRange(index);
        return (T)items[index];
    }

    public void set(int index, T item) {
        checkRange(index);
        items[index] = item;
    }

    @Override
    public int size() {
        return size;
    }

    private int size = 0;
    private Object[] items = new Object[1];

    private DynamicArray() {
        // Disable subclassing
    }

    private void ensureCapacity(int capacity) {
        if (items.length < capacity) {
            int enlargeTo = Math.max(2 * items.length, capacity);
            items = Arrays.copyOf(items, enlargeTo);
        }
    }

    private void checkRange(int index) {
        assert index >= 0 && index < size : "Index is out of range: " + Integer.toString(index);
    }
}

Complexity comparison

Here is complexity comparision of dynamic array and doubly linked list for all operations in Collection interface. As you see, linked list wins on prepending, while dynamic array wins on random access.

Operation	Array	Linked list
append(item)	O_am(1)	O(1)
prepend(item)	O(n)	O(1)
insert(index, item)	O(n)	O(n)
remove(index)	O(n)	O(n)
get(index)	O(1)	O(n)
size()	O(1)	O(1)
Space	O(n)	O(n)

Stacks and queues

Stacks and queues are abstract data structures that support limited access to their elements: they allow only appending and retrieval. They are distinguished by the particular retrieval order they support: stack is LIFO container ("last-in-first-out") and queue is FIFO container ("first-in-first-out"). Interfaces of stack and queue are presented below:

package com.dancingrobot84.algorithms.datastructures;

public interface Stack<T> {
    /** Insert new item at the end of stack **/
    void push(T item);

    /** Retrieve and remove item from the end of stack **/
    T pop();

    /** Return number of elements in stack **/
    int size();
}

package com.dancingrobot84.algorithms.datastructures;

public interface Queue<T> {
    /** Insert new item at the end of queue **/
    void enqueue(T item);

    /** Retrieve and remove item from the beginning of queue **/
    T dequeue();

    /** Return number of elements in queue **/
    int size();
}

Both stacks and queues are widely applicable in modern software engineering. Examples of stack usage are:

Function calls are stored in stack while program is executed
Many parsing and expression evaluation algorithms use stacks to hold values or currently processed lexemes
JVM is a stack-based virtual machine. It does not use registers; instead it stores all variables on stack

Examples of queue usage are:

Concurrent applications; for example, concurrent access to some resource means that there is a queue of requests and they are processed in order
Buffering and data transfer

Usually stacks and queues are implemented as adapters to another containers, for example, linked lists or dynamic arrays. In Java Queue is an interface implemented by classes ArrayDeque, LinkedList and others. Stack is a concrete class, but it is obsolete. It is recommended to use interface Deque instead and its implementations as it has methods of both stacks and queues.

Stack implementation

Explanations skipped as the implementation is quite straightforward.

package com.dancingrobot84.algorithms.datastructures.impl;

import com.dancingrobot84.algorithms.datastructures.Collection;
import com.dancingrobot84.algorithms.datastructures.Stack;

public class ArrayStack<T> implements Stack<T> {

    public static <T> Stack<T> empty() {
        return new ArrayStack<T>();
    }

    @Override
    public void push(T item) {
        items.append(item);
    }

    @Override
    public T pop() {
        assert items.size() > 0 : "Stack is empty";
        int index = items.size() - 1;
        T item = items.get(index);
        items.remove(index);
        return item;
    }

    @Override
    public int size() {
        return items.size();
    }

    private Collection<T> items = DynamicArray.empty();

    private ArrayStack() {
        // Disable subclassing
    }
}

Implementation of stack using doubly-linked list will be exactly the same. Complexity of operations is O(1) amortized for array and O(1) for list.

Queue implementation using dynamic array

Idea: this is standard two-pointer implementation of a queue.

Because in queue we need to retrieve element from the beginning of a queue, it is inefficient to shift the whole array back each time dequeue is called. Instead lets create a pointer to location where the real beginning of a queue is and shift it forward on dequeue. On enqueue, however, lets insert new item not just at the end of an array, but, if possible, in the free place between array's beginning and queue's beginning. In order to achieve this lets create another pointer to store location of the real end of a queue. When this two pointers meet lets relocate the whole array so that queue's beginning and end will be the same as underlying array's are.

package com.dancingrobot84.algorithms.datastructures.impl;

import com.dancingrobot84.algorithms.datastructures.Queue;

public class ArrayQueue<T> implements Queue<T> {

    public static <T> Queue<T> empty() {
        return new ArrayQueue<T>();
    }

    @Override
    public void enqueue(T item) {
        if (items.size() == queueEnd) {
            if (queueStart == 0) {
                items.append(item);
            } else {
                queueEnd = 0;
                items.set(0, item);
            }
            queueEnd++;
        } else {
            items.set(queueEnd, item);
            queueEnd++;
            if (queueEnd == queueStart) {
                relocate();
            }
        }
    }

    @Override
    public T dequeue() {
        assert queueEnd != queueStart : "Queue is empty";

        T item;
        item = items.get(queueStart);
        queueStart++;

        if (queueStart == queueEnd) {
            queueStart = 0;
            queueEnd = 0;
        } else if (queueStart == items.size()) {
            queueStart = 0;
        }

        return item;
    }

    @Override
    public int size() {
        if (queueEnd == queueStart) {
            return 0;
        } else if (queueEnd > queueStart) {
            return queueEnd - queueStart;
        } else {
            return items.size() - (queueStart - queueEnd);
        }
    }

    @SuppressWarnings("unchecked")
    private DynamicArray<T> items = (DynamicArray<T>) DynamicArray.empty();
    private int queueStart = 0;
    private int queueEnd = 0;

    private ArrayQueue() {
        // Disable subclassing
    }

    private void relocate() {
        @SuppressWarnings("unchecked")
        DynamicArray<T> newItems = (DynamicArray<T>) DynamicArray.empty();

        queueEnd--;
        for (int i = queueStart; i != queueEnd; i = (i+1) % items.size()) {
            newItems.append(items.get(i));
        }
        newItems.append(items.get(queueEnd));

        items = newItems;
        queueStart = 0;
        queueEnd = items.size();
    }
}

Complexity of this solution is O(1) amortized for both operations.

Queue implementation using doubly-linked list

This implementation is much easier, because in doubly-linked list we can remove item from both beginning and end in O(1).

package com.dancingrobot84.algorithms.datastructures.impl;

import com.dancingrobot84.algorithms.datastructures.Collection;
import com.dancingrobot84.algorithms.datastructures.Queue;

public class ListQueue<T> implements Queue<T> {

    public static <T> Queue<T> empty() {
        return new ListQueue<T>();
    }

    @Override
    public void enqueue(T item) {
        items.append(item);
    }

    @Override
    public T dequeue() {
        assert items.size() > 0 : "Queue is empty";
        T item = items.get(0);
        items.remove(0);
        return item;
    }

    @Override
    public int size() {
        return items.size();
    }

    private Collection<T> items = DoublyLinkedList.empty();

    private ListQueue() {
        // Disable subclassing
    }
}

Complexity of this solution is O(1) for both operations.

Cormen, Leiserson, Rivest, Stein. Introduction to Algorithms
Skiena. The Algorithm Design Manual