Vamshi Jandhyala - Counting triangles in a graph

Why is counting triangles important

Triangle counting is an important problem in graph mining. Two frequently used metrics in complex network analysis that require the count of triangles are the clustering coefficients and the transitivity ratio of the graph. Triangles have been used successfully in several real-world applications, such as detection of spamming activity, uncovering the hidden thematic structure of the web and link recommendation in online social networks.

Using brute force

A triangle in a graph is a set of three vertices $u, v, w$, where every two of them connected by an edge. An obvious algorithm for finding a triangle inspects every triple of vertices, and thus it needs roughly $n^3$ operations for an $n$-vertex graph (there are ${n \choose 3}$ triples to look at, and ${n \choose 3}$ is approximately $n^3/6$ for large $n$).

Using matrix multiplication

There is a faster method for breaking the $n^3$ barrier which is algebraic and based on fast matrix multiplication.

For notational convenience let the vertex set of the given graph $G$ be labelled as $\{0, 1, 2, . . ., n-1\}$. The adjacency matrix of $G$ is the $n\times n$ matrix $A$ with

\[ a_{ij} = \begin{cases} 1 & \text{ if $i \neq j$ and $\{i, j\} ∈ E(G)$}, \\ 0 & \text{ otherwise}. \end{cases} \]

The key insight is to understand the square $B := A^2$. By the definition of matrix multiplication we have $b_{ij} = \sum_{k=1}^n a_{ik}a_{kj}$ , and

\[ b_{ij} = \begin{cases} 1 & \text{if the vertex $k$ is adjacent to both $i$ and $j$}, \\ 0 & \text{ otherwise}. \end{cases} \]

So $b_{ij}$ counts the number of common neighbors of $i$ and $j$.

Finding a triangle is equivalent to finding two adjacent vertices $i, j$ with a common neighbor $k$. So we look for two indices $i, j$ such that both $a_{ij}\neq 0$ and $b_{ij} \neq 0$. Summing up all the $b_{ij}$s where both $a_{ij}$ and $b_{ij}$ are non zero gives us $6$ times the number of triangles in the graph as the same triangle is counted twice for each of the three pairs of indices comprising the triangle.

If we compute the matrix $B = A^2$ according to the definition, we need about $n^3$ arithmetic operations and thus we save nothing compared to the naive method of inspecting all triples of vertices. However, ingenious algorithms are known that multiply $n \times n$ matrices asymptotically faster. The oldest one, due to Strassen, needs roughly $n^{2.807}$ arithmetic operations.

Using the cube of the adjancency matrix

The cube of the adjancency matrix $A$ gives us the number of paths of length $3$ between any pair of vertices. We need to look at the trace of the matrix so that we get the number of triangles in the graph. The trace has to be divided by $6$ because each triangle is counted twice for each vertex in the triangle.

Using node-iteration

In this algorithm we iterate over all nodes and tests for each pair of neighbors if they are connected by an edge. The asymptotic running time is given by the expression $\sum_{v \in V} {d(v) \choose 2}$ and is bounded by $O(nd^2_{max})$. Here each triangle is counted thrice, once for each node.

Python code for counting triangles

import networkx as nx
from itertools import combinations, product
import numpy as np

G = nx.erdos_renyi_graph(20, 0.5, seed=123, directed=False)

def count_triangles_brute_force(G):
    cnt_triangles = 0
    for i,j,k in combinations(G.nodes, 3):
        if G.has_edge(i,j) and G.has_edge(j,k) and G.has_edge(i,k):
            cnt_triangles += 1
    return cnt_triangles

def count_triangles_mat_mul(G):
    cnt_triangles = 0
    A = nx.to_numpy_matrix(G)
    B = A.dot(A)
    for i,j in product(G.nodes, G.nodes):
        if G.has_edge(i,j) and B[i,j] != 0:
            cnt_triangles += B[i,j]
    return cnt_triangles/6

def count_triangles_mat_mul2(G):
    A = nx.to_numpy_matrix(G)
    return np.trace(np.linalg.matrix_power(A,3))/6

def count_triangles_node_iterator(G):
    cnt_triangles = 0
    for i in G.nodes:
        for j,k in combinations(G.neighbors(i),2):
            if G.has_edge(j,k):
                cnt_triangles += 1
    return cnt_triangles/3

print(count_triangles_brute_force(G))
print(count_triangles_mat_mul(G))
print(count_triangles_node_iterator(G))
print(count_triangles_mat_mul2(G))

Listing triangles using edge-iteration

The algorithm edge-iterator iterates over all edges and compares the adjacency data structure of both incident nodes. For an edge $\{u, w\}$ the nodes $\{u, v, w\}$ induce a triangle if and only if node $v$ is present in both adjacency data structures $Adj(u)$ and $Adj(w)$.The running time can then be expressed with

\[ \sum_{\{u,w\} \in E} d(u) + d(w). \]

Python code for listing triangles

def list_triangles_edge_iteration(G):
    triangles = set()
    for u,w in G.edges:
        for v in set(G.neighbors(u)).intersection(G.neighbors(w)):
            triangles.add(frozenset([u,v,w]))
    return triangles

print(list_triangles_edge_iteration(G))

References

Thirty-three Miniatures: Mathematical and Algorithmic Applications of Linear Algebra

Finding, Counting and Listing all Triangles in Large Graphs, An Experimental Study