Graph data structures are incredibly important and exist in applications that we use daily! Think Google Maps, Uber, Facebook. This article will help you grasp them better and introduce you to basic graph traversal.
What’s a graph? Let’s start off with a nice concise definition: A graph is a set of vertices or nodes connected through edges. A graph can be directed or undirected, it can be cyclic or acyclic, it can be weighted or unweighted. A graph is directed if there is direction between two connected nodes, you can tell a graph is directed in its visual representation when there are arrows on the edges. Undirected graphs have no distinction between direction, for example Node A -> Node B is the same as Node B -> Node A in an undirected graph. A weighted graph can represent arbitrary values such as cost, distance etc. through its edges.
There’s a few special types of graphs: Trees, Rooted Trees, DAGS and Complete Graphs.
- Trees — An undirected graph with no weighted edges. There are always N nodes with N-1 edges.
- Rooted Tree — A tree with a designated tree node, where every edge either points away from or points toward the root. Trees where the direction moves out from the root node, the tree is displaying arborescence. When the direction moves in towards the root node, the tree is displaying anti-arborescence.
- DAGS (Directed Acyclic Graph) — Directed graphs with no cycles. A type of DAG is a bipartite graph. Bipartite graphs are graphs whose vertices can be split into two groups U and V such that every edge connects U and V. This means that the graph is two colorable or does not have an odd length cycle.
- Complete Graph — A graph that has a unique edge between all vertices.
An adjacency list will have a map that has keys for all the vertices of a graph. The values of those keys represent an array/list of other vertices that have a connection with that node. You can figure out all the edges of a graph by iterating through the adjacency list and then iterating through the respective arrays. The pros of using an adjacency list are that it’s great for sparse graphs, efficient for iterating over all edges. The cons are that it is not space efficient for denser graphs, edge lookup is slow, and it has a slightly more complex representation than its counterparts.
An adjacency matrix is a 2-D matrix that is N x N in dimension. Each entry in the matrix represents if an edge exists between those two vertices. For example, matrix[i][j] = true means that an edge exists between vertices i and j, if it were false then that would mean that an edge doesn’t exist. In place of true/false values you can also have a value that represents the weight of an edge if it were a weighted graph. The pros of an adjacency matrix are that it is space efficient and edge lookup is constant, it is the simplest graph representation. The cons are that a matrix is n² space & time.
An edge list is usually an unordered list/array of vertex pairs or pairs + weight. The pros are that it is great for sparse graphs, iterating is efficient and it has a simple structure. The cons are that it is less space efficient and edge lookup is slow. This method of representing graphs isn’t very common, therefore I’ll only be showing adjacency list and matrix implementations.
What are the types of problems that are represented/solved by graphs?
Graphs can be used to solve a whole host of problems, the study of graphs and their algorithms is called Graph theory. The types of problems solved with graphs are the following:
- Shortest Path — The shortest distance between two nodes on a graph. Does google maps ring a bell?
- Connectivity — If a path exists from one node to another.
- Strongly Connected Components — If a group of vertices are all connected with each other. Social groups can be represented through strongly connected components, what is a tightly knit group of friends but a bunch of vertices that have edges between each other? 😂
- Traveling Salesman Problem — Given a list of cities, what is the shortest path where you visit every city starting from a source and returning to that source?
- Bridges — Edges when removed increase the number of connected components.
- Articulation Points — Vertices when removed increase the number of connected components.
- Minimum Spanning Tree — Finds the minimal subtree that connects all vertices of a graph.
- Network Flow — Finds the max flow that you can send through a system by constructing a max-flow graph. Starts from a source node and moves towards a sink node.
Most of the problems listed above are solved through one of the many different graph algorithms. In later articles, I’ll be covering various graph algorithms and their JS implementations. The majority of the more complex graph algorithms have either depth first search (DFS) or breadth first search (BFS) as their underlying mode of graph traversal. Below, I’ll do a high-level overview of DFS and BFS and then go into implementation afterward.
Depth first search is a form of graph traversal that uses the last-in-first-out (LIFO) paradigm. In a depth first search, you are aiming for depth, the traversal moves through the levels of a tree/graph before it back-tracks. To achieve this, you’re using a stack — be it explicitly in an iterative approach or implicitly through the call stack in a recursive approach. You’re starting off with the ‘start’ node and adding its neighbors to the stack, at which point you pop off another node from the stack and explore its neighbors. If you’re only ever popping off the last node, you’re traversing through the levels of a tree/graph on every iteration or call of that depth first search.
Breadth first search is a form of graph traversal that uses the first-in-first-out (FIFO) paradigm. In a breadth first search, you are aiming for breadth, the traversal explores the level of a tree/graph entirely before moving on to the next level. Due to the nature of the search, a BFS is only done iteratively. DFS lends itself well to recursion because of the nature of the call stack, however, in BFS the only data structure that works is a queue, because a queue is FIFO.
Using either DFS or BFS for graph traversal is entirely dependent on the structure of a graph and the type of problem that you’re trying to solve. The time complexity of DFS/BFS is O(V + E) when using an adjacency list and O(V²) when using an adjacency matrix, where V is the number of vertices and E is the number of edges. In an adjacency list, you only ever store information about the actual edges that exist and in the worst case you iterate over all the vertices and all the edges to find the target node. In a matrix, the size of the matrix is V x V, where V is the number of vertices. In order to search for a target node, you essentially traverse through V arrays with the size of V, which gives you a time complexity of O(V²).
- Constructor — The first thing to do is create a class called Graph and initialize it with an adjacency list or matrix. The adjacency list is usually either a map or an ‘object’ in JS and the matrix is an array of arrays. If you’re using a matrix you’re going to have to instantiate the graph with the number of nodes you wish to have in the graph.
2. Add Vertex — The second thing that you want to set up is a method to add vertices. If you’re using an adjacency matrix, then you don’t need to use this method as the vertices are already ‘initialized’ upon the instantiation of the graph. In the adjacency list approach, to add a vertex, you just create a new key with the value of an empty array.
3. Add Edge — The next logical thing is adding connections between our vertices! This is simple for adjacency matrices and slightly more complex for adjacency lists. The first image shows the adjacency list way of adding edges and the second image shows the matrix method. In the adjacency list method, I’m also calling the add vertex method on the vertices that constitute the edge as a precaution to them not existing in the first place. This addEdge method is creating an undirected edge, if you wanted to create a directed edge, you would only push in v2 to v1’s array or vice versa.
3. Remove Edge — If we can add edges, we should also be able to remove edges! To remove an edge in an adjacency list that represents undirected edges, as in the image below, we have to filter it out of the respective vertex’s array of edges. In a matrix, it’s easy as setting that respective entry to false.
4. Remove Vertex — We should also be able to remove a single node from our graph. In order to remove a vertex in an adjacency list, we would have to iterate over that particular vertex’s edges and call removeEdge on them. Once that is done, we would delete the vertex’s key from the adjacency list map/object. To delete a vertex from an adjacency matrix is a bit different and opens up some interesting questions you have to ask yourself before proceeding with a particular method. If you wish to preserve the indices with which you’re referencing your vertices, you can potentially set the array at matrix[i] to be null to represent the vertex being null and then iterate over every other row and set the column j to be a null value (matrix[i][j] = null). If you don’t care to maintain an original numbering of the vertices then you can just filter out the vertex’s row from the matrix and then filter out the vertex from the remaining rows.
NOTE: DFS and BFS methods below will only be implemented based on adjacency lists. At this point you should have an idea how adjacency matrices work and should be able to alter the below methods to cater to a matrix.
5. Depth First Search (Recursive) — Remember what I mentioned earlier in the article? This approach will be using the call stack as our ‘stack’. We first initialize a results variable which will hold the result of our DFS search, then we initialize a variable to keep track of the nodes that’ve been visited. We also need to cache the adjacency list into its own variable within the method so that we don’t have to worry about who this is when our DFS function refers to the adjacency list. The DFS function is an IIFE (immediately invoked function expression), it is invoked with the start node. The function first checks if the node passed in is valid, if it isn’t then null is returned. We mark our node as visited and then push it into our results array. We then iterate over that vertex’s edges and recursively call the DFS function on a neighbor IF it hasn’t been visited. At the end we can return our results array with a DFS ordering of our vertices.
6. Depth First Search (Iterative) — In the iterative approach we are going to explicitly use a stack. Similar to the recursive approach, I have a result array and a visited map/object. I initialize the stack with the start node in it and mark it as visited. I then use a while loop to keep looping while the stack has elements in it, in every iteration I pop a node and push it into the results array. I iterate over that node’s children, if a child is unvisited, I visit it and add it to the stack. I break out of the loop once the stack is empty and then return our result array.
7. Breadth First Search — The BFS approach is going to be almost the same as the DFS iterative approach. Instead of using a stack we’re going to initialize a queue with the start variable. For the sake of simplicity, I’m going to use an array for the queue, but in practice you should be using a data structure such as a linked list to model a queue. Shifting from a regular array is an O(n) operation whereas in a linked list/queue it would be O(1). I also visit the start node before jumping into the while loop. The loop keeps executing while there are elements in the queue. Inside the while loop, a node is shifted from the queue on every iteration. That node’s children are explored and if a child node hasn’t been visited, it is marked as visited and pushed into the queue. In the end, after the loop terminates, we return the result array.
Stay tuned for more articles in the DS & A series. I will be releasing articles on advanced graph algorithms in the future!
Please feel free to reach out with questions/comments!