Algorithms 101: Basic Data Structures Pt. 1
This article is intended to cover some basic data structures: arrays, linked lists, stacks and queues; their applications and implementations in Java. Source code of implementations used in this article is available in git repository.
Abstract and concrete data structures
Abstract data structure is a mathematical model, an interface declaring desired behaviour. Concrete data structure is an implementation of such interface. For example, collection or container is an abstract data structure. It says only that some items of particular type are stored inside it and declares methods for accessing and updating them. Array and list, on the other side, are concrete data structures. They describe the way items are stored (in continuous memory chunk versus as a chain of pointers) and implement methods declared in collection's interface.
Collections
Collection (Container) is an abstract data structure. It
holds several items of some specified type T
and offers methods for accessing
and updating them. The following interface will be used throughout this article
to describe containers:
package com.dancingrobot84.algorithms.datastructures;
public interface Collection<T> {
/** Insert item at the beginning of collection **/
void append(T item);
/** Insert item at the end of collection **/
void prepend(T item);
/** Insert item at specified place of collection **/
void insert(int index, T item);
/** Retrieve item at specified place **/
T get(int index);
/** Return the size of collection **/
int size();
/** Remove item at specified place from collection **/
void remove(int index);
}
Arrays
Array is a container that holds a number of items in contiguous area of memory. In its simplest form, static array, the capacity of a container is set up at the moment of its creation and may not be changed. Dynamic arrays, however, can grow automatically when current capacity is exceeded.
Arrays are used both on its own and as an underlying data structure for more sophisticated structures, e.g. heaps, associative arrays and adjacency matrices.
Arrays are supported out of the box in overwhelming majority of modern programming languages. Usually static arrays present as a syntactic feature. Here is an example of creating static array of 10 ints in Java:
int[] arrayOfInts = new int[10];
Dynamic arrays often shipped as utility classes of standard library
(std::vector
in C++ and ArrayList
in Java).
Dynamic array implementation
Idea: let's start with a small static array. If there is no free space left
to insert new item, then allocate new static array with extended capacity and copy
all items from the old array into the new one. It is proven that allocating new
array twice as large as the old one gives you O(1) amortized
complexity of
inserting new item at the end of an array.
Here is a simple implementation of dynamic array in Java:
package com.dancingrobot84.algorithms.datastructures.impl;
import java.util.Arrays;
import com.dancingrobot84.algorithms.datastructures.Collection;
public class DynamicArray<T> implements Collection<T> {
public static <T> Collection<T> empty() {
return new DynamicArray<T>();
}
@Override
public void append(T item) {
ensureCapacity(size + 1);
items[size] = item;
size += 1;
}
@Override
public void prepend(T item) {
ensureCapacity(size + 1);
System.arraycopy(items, 0, items, 1, size);
items[0] = item;
size += 1;
}
@Override
public void insert(int index, T item) {
assert index >= 0 : "Index is out of range: " + Integer.toString(index);
size = Math.max(index + 1, size + 1);
ensureCapacity(size);
System.arraycopy(items, index, items, index+1, size-index-1);
items[index] = item;
}
@Override
public void remove(int index) {
checkRange(index);
System.arraycopy(items, index+1, items, index, size-index-1);
size -= 1;
}
@Override
@SuppressWarnings("unchecked")
public T get(int index) {
checkRange(index);
return (T)items[index];
}
public void set(int index, T item) {
checkRange(index);
items[index] = item;
}
@Override
public int size() {
return size;
}
private int size = 0;
private Object[] items = new Object[1];
private DynamicArray() {
// Disable subclassing
}
private void ensureCapacity(int capacity) {
if (items.length < capacity) {
int enlargeTo = Math.max(2 * items.length, capacity);
items = Arrays.copyOf(items, enlargeTo);
}
}
private void checkRange(int index) {
assert index >= 0 && index < size : "Index is out of range: " + Integer.toString(index);
}
}
Advanced implementation can shrink array when the number of items drops below
some point, say, half of capacity. However, it should be implemented with
cautious, because too eager behaviour can cause drastic change of complexity.
Imagine an array which shrinks itself in half as soon as the number of items
drops below half of capacity; when there are n/2
items and you remove one,
then insert one and repeat these actions m
times the overall complexity will
be O(mn)
. It's better to shrink array by lesser amount, a third or a fourth of
capacity for example.
Linked lists
Linked list is also a container. It stores its items in nodes. Each node consists of at least two fields: a value to store and a pointer to another node. Nodes are chained one to another and, thus, can be located in arbitrary places of memory (contrary to arrays). Linked lists can be divided into several groups according to the number of pointers in its nodes: singly-linked, doubly-linked and multiply-linked.
Linked lists can be used either on its own or as an underlying data structure for such ADTs as adjacency lists, stacks, queues and for collision resolution in hash tables.
Linked lists are present in many programming languages, for example, std::list
in C++ and LinkedList
in Java. Usually they are implemented as doubly-linked
lists.
Doubly-linked list implementation
Idea: key idea of this implementation is to use head and tail sentinel
nodes. They are constant for particular linked list object; they point to each
other when the list is empty and to the first/last node otherwise. Head
sentinel's prev
is always null
, just as tail sentinel's next
. Main
purpose of such sentinel nodes is to save us time and energy playing around
null
checks: this way we can be sure that list always has head and tail nodes,
though they're merely decorative.
Here is a simple implementation of doubly-linked list in Java:
package com.dancingrobot84.algorithms.datastructures.impl;
import java.util.Arrays;
import com.dancingrobot84.algorithms.datastructures.Collection;
public class DynamicArray<T> implements Collection<T> {
public static <T> Collection<T> empty() {
return new DynamicArray<T>();
}
@Override
public void append(T item) {
ensureCapacity(size + 1);
items[size] = item;
size += 1;
}
@Override
public void prepend(T item) {
ensureCapacity(size + 1);
System.arraycopy(items, 0, items, 1, size);
items[0] = item;
size += 1;
}
@Override
public void insert(int index, T item) {
assert index >= 0 : "Index is out of range: " + Integer.toString(index);
size = Math.max(index + 1, size + 1);
ensureCapacity(size);
System.arraycopy(items, index, items, index+1, size-index-1);
items[index] = item;
}
@Override
public void remove(int index) {
checkRange(index);
System.arraycopy(items, index+1, items, index, size-index-1);
size -= 1;
}
@Override
@SuppressWarnings("unchecked")
public T get(int index) {
checkRange(index);
return (T)items[index];
}
public void set(int index, T item) {
checkRange(index);
items[index] = item;
}
@Override
public int size() {
return size;
}
private int size = 0;
private Object[] items = new Object[1];
private DynamicArray() {
// Disable subclassing
}
private void ensureCapacity(int capacity) {
if (items.length < capacity) {
int enlargeTo = Math.max(2 * items.length, capacity);
items = Arrays.copyOf(items, enlargeTo);
}
}
private void checkRange(int index) {
assert index >= 0 && index < size : "Index is out of range: " + Integer.toString(index);
}
}
Complexity comparison
Here is complexity comparision of dynamic array and doubly linked list for all
operations in Collection
interface. As you see, linked list wins on
prepending, while dynamic array wins on random access.
Operation | Array | Linked list |
---|---|---|
append(item) | Oam(1) | O(1) |
prepend(item) | O(n) | O(1) |
insert(index, item) | O(n) | O(n) |
remove(index) | O(n) | O(n) |
get(index) | O(1) | O(n) |
size() | O(1) | O(1) |
Space | O(n) | O(n) |
Stacks and queues
Stacks and queues are abstract data structures that support limited access to their elements: they allow only appending and retrieval. They are distinguished by the particular retrieval order they support: stack is LIFO container ("last-in-first-out") and queue is FIFO container ("first-in-first-out"). Interfaces of stack and queue are presented below:
package com.dancingrobot84.algorithms.datastructures;
public interface Stack<T> {
/** Insert new item at the end of stack **/
void push(T item);
/** Retrieve and remove item from the end of stack **/
T pop();
/** Return number of elements in stack **/
int size();
}
package com.dancingrobot84.algorithms.datastructures;
public interface Queue<T> {
/** Insert new item at the end of queue **/
void enqueue(T item);
/** Retrieve and remove item from the beginning of queue **/
T dequeue();
/** Return number of elements in queue **/
int size();
}
Both stacks and queues are widely applicable in modern software engineering. Examples of stack usage are:
- Function calls are stored in stack while program is executed
- Many parsing and expression evaluation algorithms use stacks to hold values or currently processed lexemes
- JVM is a stack-based virtual machine. It does not use registers; instead it stores all variables on stack
Examples of queue usage are:
- Concurrent applications; for example, concurrent access to some resource means that there is a queue of requests and they are processed in order
- Buffering and data transfer
Usually stacks and queues are implemented as adapters to another containers,
for example, linked lists or dynamic arrays. In Java Queue
is an interface
implemented by classes ArrayDeque
, LinkedList
and others. Stack
is
a concrete class, but it is obsolete. It is recommended to use interface
Deque
instead and its implementations as it has methods of both stacks and
queues.
Stack implementation
Explanations skipped as the implementation is quite straightforward.
package com.dancingrobot84.algorithms.datastructures.impl;
import com.dancingrobot84.algorithms.datastructures.Collection;
import com.dancingrobot84.algorithms.datastructures.Stack;
public class ArrayStack<T> implements Stack<T> {
public static <T> Stack<T> empty() {
return new ArrayStack<T>();
}
@Override
public void push(T item) {
items.append(item);
}
@Override
public T pop() {
assert items.size() > 0 : "Stack is empty";
int index = items.size() - 1;
T item = items.get(index);
items.remove(index);
return item;
}
@Override
public int size() {
return items.size();
}
private Collection<T> items = DynamicArray.empty();
private ArrayStack() {
// Disable subclassing
}
}
Implementation of stack using doubly-linked list will be exactly the same.
Complexity of operations is O(1) amortized
for
array and O(1)
for list.
Queue implementation using dynamic array
Idea: this is standard two-pointer implementation of a queue.
Because in queue we need to retrieve element from the beginning of a queue, it
is inefficient to shift the whole array back each time dequeue
is called.
Instead lets create a pointer to location where the real beginning of a queue is
and shift it forward on dequeue
. On enqueue
, however, lets insert new item
not just at the end of an array, but, if possible, in the free place between
array's beginning and queue's beginning. In order to achieve this lets create
another pointer to store location of the real end of a queue. When this two
pointers meet lets relocate the whole array so that queue's beginning and end
will be the same as underlying array's are.
package com.dancingrobot84.algorithms.datastructures.impl;
import com.dancingrobot84.algorithms.datastructures.Queue;
public class ArrayQueue<T> implements Queue<T> {
public static <T> Queue<T> empty() {
return new ArrayQueue<T>();
}
@Override
public void enqueue(T item) {
if (items.size() == queueEnd) {
if (queueStart == 0) {
items.append(item);
} else {
queueEnd = 0;
items.set(0, item);
}
queueEnd++;
} else {
items.set(queueEnd, item);
queueEnd++;
if (queueEnd == queueStart) {
relocate();
}
}
}
@Override
public T dequeue() {
assert queueEnd != queueStart : "Queue is empty";
T item;
item = items.get(queueStart);
queueStart++;
if (queueStart == queueEnd) {
queueStart = 0;
queueEnd = 0;
} else if (queueStart == items.size()) {
queueStart = 0;
}
return item;
}
@Override
public int size() {
if (queueEnd == queueStart) {
return 0;
} else if (queueEnd > queueStart) {
return queueEnd - queueStart;
} else {
return items.size() - (queueStart - queueEnd);
}
}
@SuppressWarnings("unchecked")
private DynamicArray<T> items = (DynamicArray<T>) DynamicArray.empty();
private int queueStart = 0;
private int queueEnd = 0;
private ArrayQueue() {
// Disable subclassing
}
private void relocate() {
@SuppressWarnings("unchecked")
DynamicArray<T> newItems = (DynamicArray<T>) DynamicArray.empty();
queueEnd--;
for (int i = queueStart; i != queueEnd; i = (i+1) % items.size()) {
newItems.append(items.get(i));
}
newItems.append(items.get(queueEnd));
items = newItems;
queueStart = 0;
queueEnd = items.size();
}
}
Complexity of this solution is O(1) amortized
for both operations.
Queue implementation using doubly-linked list
This implementation is much easier, because in doubly-linked list we can
remove item from both beginning and end in O(1)
.
package com.dancingrobot84.algorithms.datastructures.impl;
import com.dancingrobot84.algorithms.datastructures.Collection;
import com.dancingrobot84.algorithms.datastructures.Queue;
public class ListQueue<T> implements Queue<T> {
public static <T> Queue<T> empty() {
return new ListQueue<T>();
}
@Override
public void enqueue(T item) {
items.append(item);
}
@Override
public T dequeue() {
assert items.size() > 0 : "Queue is empty";
T item = items.get(0);
items.remove(0);
return item;
}
@Override
public int size() {
return items.size();
}
private Collection<T> items = DoublyLinkedList.empty();
private ListQueue() {
// Disable subclassing
}
}
Complexity of this solution is O(1)
for both operations.
-
Cormen, Leiserson, Rivest, Stein. Introduction to Algorithms
-
Skiena. The Algorithm Design Manual