5 min read

Counting triangles in a graph

Table of Contents

Why is counting triangles important

Triangle counting is an important problem in graph mining. Two frequently used metrics in complex network analysis that require the count of triangles are the clustering coefficients and the transitivity ratio of the graph. Triangles have been used successfully in several real-world applications, such as detection of spamming activity, uncovering the hidden thematic structure of the web and link recommendation in online social networks.

Using brute force

A triangle in a graph is a set of three vertices u,v,wu, v, w, where every two of them connected by an edge. An obvious algorithm for finding a triangle inspects every triple of vertices, and thus it needs roughly n3n^3 operations for an nn-vertex graph (there are (n3){n \choose 3} triples to look at, and (n3){n \choose 3} is approximately n3/6n^3/6 for large nn).

Using matrix multiplication

There is a faster method for breaking the n3n^3 barrier which is algebraic and based on fast matrix multiplication.

For notational convenience let the vertex set of the given graph GG be labelled as {0,1,2,...,n−1}\{0, 1, 2, . . ., n-1\}. The adjacency matrix of GG is the n×nn\times n matrix AA with

aij={1 if i≠j and {i,j}∈E(G),0 otherwise. a_{ij} = \begin{cases} 1 & \text{ if $i \neq j$ and $\{i, j\} ∈ E(G)$}, \\ 0 & \text{ otherwise}. \end{cases}

The key insight is to understand the square B:=A2B := A^2. By the definition of matrix multiplication we have bij=∑k=1naikakjb_{ij} = \sum_{k=1}^n a_{ik}a_{kj} , and

bij={1if the vertex k is adjacent to both i and j,0 otherwise.b_{ij} = \begin{cases} 1 & \text{if the vertex $k$ is adjacent to both $i$ and $j$}, \\ 0 & \text{ otherwise}. \end{cases}

So bijb_{ij} counts the number of common neighbors of ii and jj.

Finding a triangle is equivalent to finding two adjacent vertices i,ji, j with a common neighbor kk. So we look for two indices i,ji, j such that both aij≠0a_{ij}\neq 0 and bij≠0b_{ij} \neq 0. Summing up all the bijb_{ij}s where both aija_{ij} and bijb_{ij} are non zero gives us 66 times the number of triangles in the graph as the same triangle is counted twice for each of the three pairs of indices comprising the triangle.

If we compute the matrix B=A2B = A^2 according to the definition, we need about n3n^3 arithmetic operations and thus we save nothing compared to the naive method of inspecting all triples of vertices. However, ingenious algorithms are known that multiply n×nn \times n matrices asymptotically faster. The oldest one, due to Strassen, needs roughly n2.807n^{2.807} arithmetic operations.

Using the cube of the adjancency matrix

The cube of the adjancency matrix AA gives us the number of paths of length 33 between any pair of vertices. We need to look at the trace of the matrix so that we get the number of triangles in the graph. The trace has to be divided by 66 because each triangle is counted twice for each vertex in the triangle.

Using node-iteration

In this algorithm we iterate over all nodes and tests for each pair of neighbors if they are connected by an edge. The asymptotic running time is given by the expression ∑v∈V(d(v)2)\sum_{v \in V} {d(v) \choose 2} and is bounded by O(ndmax2)O(nd^2_{max}). Here each triangle is counted thrice, once for each node.

Python code for counting triangles

import networkx as nx
from itertools import combinations, product
import numpy as np

G = nx.erdos_renyi_graph(20, 0.5, seed=123, directed=False)

def count_triangles_brute_force(G):
    cnt_triangles = 0
    for i,j,k in combinations(G.nodes, 3):
        if G.has_edge(i,j) and G.has_edge(j,k) and G.has_edge(i,k):
            cnt_triangles += 1
    return cnt_triangles

def count_triangles_mat_mul(G):
    cnt_triangles = 0
    A = nx.to_numpy_matrix(G)
    B = A.dot(A)
    for i,j in product(G.nodes, G.nodes):
        if G.has_edge(i,j) and B[i,j] != 0:
            cnt_triangles += B[i,j]
    return cnt_triangles/6

def count_triangles_mat_mul2(G):
    A = nx.to_numpy_matrix(G)
    return np.trace(np.linalg.matrix_power(A,3))/6

def count_triangles_node_iterator(G):
    cnt_triangles = 0
    for i in G.nodes:
        for j,k in combinations(G.neighbors(i),2):
            if G.has_edge(j,k):
                cnt_triangles += 1
    return cnt_triangles/3

print(count_triangles_brute_force(G))
print(count_triangles_mat_mul(G))
print(count_triangles_node_iterator(G))
print(count_triangles_mat_mul2(G))

Listing triangles using edge-iteration

The algorithm edge-iterator iterates over all edges and compares the adjacency data structure of both incident nodes. For an edge {u,w}\{u, w\} the nodes {u,v,w}\{u, v, w\} induce a triangle if and only if node vv is present in both adjacency data structures Adj(u)Adj(u) and Adj(w)Adj(w).The running time can then be expressed with

∑{u,w}∈Ed(u)+d(w).\sum_{\{u,w\} \in E} d(u) + d(w).

Python code for listing triangles

def list_triangles_edge_iteration(G):
    triangles = set()
    for u,w in G.edges:
        for v in set(G.neighbors(u)).intersection(G.neighbors(w)):
            triangles.add(frozenset([u,v,w]))
    return triangles

print(list_triangles_edge_iteration(G))

References

Thirty-three Miniatures: Mathematical and Algorithmic Applications of Linear Algebra

Finding, Counting and Listing all Triangles in Large Graphs, An Experimental Study