Graph-Based Recommender Systems

In the digital age, recommender systems have become indispensable tools for personalizing user experiences across platforms like Spotify for music or video on demand streaming services. These systems predict user preferences by analysing patterns in user behaviour, item attributes and contextual signals.

While traditional approaches like collaborative filtering (CF) and content-based filtering have proven effective, they often struggle with challenges like data sparsity, cold-start problems or lack of explainability.

Enter graph-based recommender systems, a paradigm that leverages the inherent relational structure of data to enhance recommendation quality. By modelling users, items, and their interactions as a graph, these systems can capture complex dependencies which traditional methods might miss.

For example, a graph can represent not only direct user-item interactions (e.g., a user rating a movie) but also indirect relationships (e.g., two users who like similar genres or an item connected to a popular actor). This flexibility allows graph-based RS to address sparsity, improve explainability, and integrate heterogeneous data sources seamlessly.

Graph Modelling — The Backbone of Graph-Based Recommendations

What's a graph? A graph is a mathematical structure $G$ composed of two primary components $G = (V, E)$

  • Vertices $V$ (Nodes): Represent entities such as users, items, or attributes (e.g. genres, actors).
  • Edges $E$: Represent relationships between these entities, such as user-item interactions, social connections or semantic links.

In the context of recommender systems, graphs can be directed (e.g., a user follows another user) or undirected (e.g., mutual friendships), and they can be weighted (e.g., a 5-star rating carries more weight than a 1-star rating) or unweighted (e.g., a binary "like" interaction).

Graphs as Adjacency Matrix

One of the most common ways to represent a graph computationally is through an adjacency matrix $A$.

For a graph with $n$ vertices, $A$ is an $n \times n$ matrix where:

  • $A_{ij} = 1$ if there is an edge between vertices $v_i$ and $v_j$.
  • $A_{ij} = 0$ otherwise.

For weighted graphs, $A_{ij}$ can take any non-negative value representing the strength of the relationship. For example, in a user-item interaction graph, $A_{ui}$ might represent the rating a user $u$ gave to an item $i$.

Consider a simple bipartite graph with 3 users and 4 items, where an edge indicates that a user has interacted with (e.g., purchased or rated) an item.

Exemplary User-Item Matrix

Here, $A_{12} = 1$ indicates that User 1 interacted with Item 2, while $A_{34} = 0$ indicates no interaction between User 3 and Item 4.

Types of Graphs in Recommender Systems

Graphs in recommender systems can take various forms depending on the relationships they model

User-Item Bipartite Graphs

In user-item bipartite graphs users and items are both represented by nodes, of different types, whereas interactions such as ratings, clicks, purchases, are represented as edges.

Exemplary User-Item Bipartite Graph (matching User-Item Matrix above)

Use Case: Collaborative filtering, where recommendations are based on similar user or items.

User-User Graphs (Social Networks)

In a user-user graph, users are of course represented by nodes, and interactions or relationships (e.g. friendships, followerships) are represented as edges.

Exemplary User-User Graph

Use Case: Leveraging social influence for recommendations (e.g. "friends who liked this also liked ...")

Item-Item Graphs

In item-item garphs items are represented by items, edges indicate similarity relationships like co-purchased items, or being of the same genre.

Exemplary Item-Item Graphs

Use Case: Item-based collaborative filtering.

Knowledge Graphs (KGs)

In knowledge graphs core entities or attributes are represented by nodes, such as users, items, but also attributes themselves (e.g. genres, actors, directors, ...). Edges represent semantic relationships between these different types of nodes (e.g. "actor X stars in movie Y").

Exemplary Knowledge Graph (Nodes of Multiple Types linked via Semantic Relationship Edges)

Use Case: Hybrid recommendations combining collaborative and content-based signals.

Why Graphs for Recommendations?

Graphs offer several advantages for recommender systems:

  • Flexibility: Graphs can model diverse relationships, from simple interactions to complex semantic connections.
  • Explainability: Recommendations can be explained through paths in the graph (e.g., "User A likes Item X because they are connected via Genre Y").
  • Handling Sparsity: By leveraging indirect relationships (e.g., transitivity), graphs can mitigate the sparsity issue in user-item interaction matrices.
  • Unified Framework: Heterogeneous data sources (e.g., social networks, item attributes) can be integrated into a single graph structure.

Graph-Based Transitivity — Leveraging Indirect Relationships

One of the most significant challenges in collaborative filtering is data sparsity. In real-world scenarios, users interact with only a tiny fraction of available items, leading to sparse user-item interaction matrices. This sparsity makes it difficult to compute reliable similarity measures between users or items, often resulting in poor recommendation quality.

Transitivity refers to the idea that if a user $u_1$ is connected to an item $i_1$, and $i_1$ is connected to another user $u_2$ who is connected to item $i_2$, then $u_1$ might also be interested in $i_2$. In graph terms, this corresponds to paths between users and items that are not directly connected.

Mathematically, we can model transitivity by considering all possible paths of length up to $M$ between a user and an item. The association strength between a user $u$ and an item $i$ is then computed as the sum of the contributions of all such paths, weighted by a decay factor $\alpha$ that prioritizes shorter paths:

$$a_{ui} = \sum_{l=1}^{M} \alpha^l \cdot \text{number of paths of length } l \text{ from } u \text{ to } i$$

Here:

  • $\alpha \in [0, 1]$ is a decay factor that reduces the contribution of longer paths.
  • $M$ is the maximum path length considered (typically an odd number to ensure paths alternate between users and items).

Example: Path-Based Recommendations

Let's consider a simple example based on the already presented user-item bipartite graphs of 3 users and 4 items.

For $M = 3$ and $\alpha = 0.5$, we can compute the association strength between User 1 and Item 3 as follows:

Find unique paths from User 1 to Item 3 of max length $M=3$

  • User 1 → Item 2 User 2 Item 3
  • User 1 Item 4 User 2 Item 3

There are no 1-hop direct paths, paths of length 2 would only lead back to a user, but we found 2 paths of length 3 from User 1 to Item 3. With a decay factor $\alpha = 0.5$, we find:

$$a_{13} = 0.5 ^3 + 0.5^3 = 0.125 + 0.125 = 0.25$$

This means that even though User 1 has not directly interacted with Item 3, the transitivity-based approach suggests a potential interest, which a traditional collaborative filtering method might miss.

Spreading Activation: Efficient Path Calculation

Calculating all possible paths in a large graph is computationally expensive. Spreading activation is an efficient alternative that simulates the propagation of influence or interest through the graph.

  1. Initialization: Assign an initial activation value to seed nodes (e.g., items a user has interacted with).
  2. Propagation: Iteratively spread activation to neighboring nodes, decaying the activation by $\alpha$ at each step.
  3. Aggregation: Sum the activation values at each node to compute the final association strengths.

Let $\mathbf{a}^{(t)}$ be the activation vector at iteration $t$. The update rule is:

$$\mathbf{a}^{(t+1)} = \alpha \cdot A \cdot \mathbf{a}^{(t)}$$

where $A$ is the adjacency matrix of the graph.

The process continues until the activation values converge or a maximum number of iterations is reached.

Transitivity-based methods are particularly effective in sparse settings, where direct interactions are limited. However, in dense settings (where users have many interactions), the additional paths may introduce noise, potentially degrading recommendation quality. Therefore, the choice of $\alpha$ and $M$ should be tuned based on the sparsity of the data.

Social Network Mining — Harnessing User Relationships

Social networks provide a rich source of information about user relationships, which can be leveraged to enhance recommender systems. For example, friends on a social platform often share similar tastes, and their interactions can serve as proxies for a user’s preferences.

Additionally, social networks can help identify influential users whose recommendations carry more weight or communities of users with shared interests.

Edge Measures: Quantifying Pairwise Relationships

Edge measures focus on the local structure of the graph, quantifying the strength or importance of individual edges (i.e., relationships between pairs of users). These measures can be used to replace or enhance traditional similarity metrics in collaborative filtering.

Tie Strength

Tie strength measures the overlap between the neighborhoods of two users, reflecting how "embedded" their relationship is within the network. A common metric for tie strength is the Jaccard coefficient, defined as:

$$J(A, B) = \frac{|N(A) \cap N(B)|}{|N(A) \cup N(B)|}$$

where $N(A)$ and $N(B)$ are the sets of neighbors of users $A$ and $B$, respectively.

Suppose User A has neighbors {User C, User D, User E}, and User B has neighbors {User D, User E, User F}. The Jaccard coefficient is:

$$J(A, B) = \frac{|\{D, E\}|}{|\{C, D, E, F\}|} = \frac{2}{4} = 0.5$$

A high Jaccard coefficient suggests that Users A and B share many mutual friends, which can be interpreted as a measure of trust or similarity.

Triadic Closure and Clustering Coefficient

Triadic closure refers to the tendency for two friends of a user to also be friends with each other. The clustering coefficient quantifies this tendency, measuring the fraction of a user’s friends that are also connected to each other.

For a user $A$ with $k$ friends, the clustering coefficient $C_A$ is:

$$C_A = \frac{\text{number of edges between } A\text{'s friends}}{k \cdot (k - 1) / 2}$$

On a global level, the clustering coefficient $C$ is defined as:

$$C = \frac{3 \times \text{number of triangles in the graph}}{\text{number of connected triples}}$$

A triangle is a set of three users who are all connected to each other, and a connected triple is a set of three users where at least one edge is missing.

Triadic Closure / Clustering Coefficient give insights into the network like

  • High clustering coefficients indicate homophily (users tend to connect with similar users), which can be exploited for similarity-based recommendations.
  • Low clustering coefficients may suggest a more diverse network, where recommendations should account for a broader range of influences.

Edge Betweenness: Identifying Critical Connections

Edge betweenness measures the extent to which an edge lies on the shortest paths between all pairs of nodes in the graph. Edges with high betweenness are often bridges that connect different communities or clusters.

Mathematically, the betweenness of an edge $e$ is:

$$\text{Betweenness}(e) = \sum_{i \neq j} \frac{\sigma_{ij}(e)}{\sigma_{ij}}$$

where:

  • $\sigma_{ij}$ is the total number of shortest paths between nodes $i$ and $j$.
  • $\sigma_{ij}(e)$ is the number of shortest paths between $i$ and $j$ that pass through edge $e$.

Edges with high betweenness are critical for information flow in the network. In recommender systems, these edges can help identify influential connections that bridge different user groups, enabling more diverse recommendations.

Node Measures: Quantifying User Importance

While edge measures focus on pairwise relationships, node measures assess the global importance of individual users within the network. These measures can be used to adjust the influence of users in collaborative filtering, ensuring that recommendations are not dominated by a few highly active users.

Degree Centrality: The Simplest Measure of Influence

Degree centrality is the simplest node measure, defined as the number of connections a node has. For a user $i$ in an undirected graph:

$$\text{Degree}(i) = \sum_{j=1}^{|V|} A_{ij}$$

In a directed graph (e.g., a followership network), we distinguish between:

  • Out-degree: Number of edges pointing from $i$ to others.
  • In-degree: Number of edges pointing to $i$ from others.

Users with high degree centrality are influential and may serve as good neighbors in collaborative filtering.

However, degree centrality does not account for the quality of connections (e.g., a user with many weak ties may not be a good recommender).

Katz Centrality: Accounting for Indirect Influence

Katz centrality extends degree centrality by considering not only direct connections but also indirect paths, while penalizing longer paths to avoid overcounting distant influences.

The Katz centrality $c_i$ for a user $i$ is given by:

$$\mathbf{c} = \alpha (I - \beta A)^{-1} \mathbf{1}$$

where:

  • $\alpha$ is a normalization constant.
  • $\beta$ is a damping factor ($0 < \beta < 1$) that controls the influence of distant nodes.
  • $I$ is the identity matrix.
  • $\mathbf{1}$ is a vector of ones.

In practice, Katz centrality is computed iteratively:

$$c_i^{(t+1)} = \alpha \sum_{j} A_{ij} c_j^{(t)} + \beta$$

Katz centrality is particularly useful for modeling diffusion processes, such as the spread of information or influence in a social network. In recommender systems, it can help identify users whose preferences are likely to propagate through the network.

Closeness Centrality: Measuring Accessibility

Closeness centrality measures how close a user is to all other users in the network, based on the shortest-path distances. A user with high closeness centrality can quickly reach or influence many other users.

The closeness centrality $C_i$ for a user $i$ is:

$$C_i = \frac{1}{\sum_{j \neq i} d(i, j)}$$

where $d(i, j)$ is the shortest-path distance between users $i$ and $j$.

Users with high closeness centrality are central hubs in the network and can serve as effective seed users for spreading recommendations.

However, computing closeness centrality is expensive for large graphs, as it requires calculating shortest paths between all pairs of nodes.

Betweenness Centrality: Identifying Key Mediators

Betweenness centrality quantifies the extent to which a user lies on the shortest paths between other users. Users with high betweenness act as brokers or gatekeepers in the network, connecting different clusters or communities.

The betweenness centrality $B_i$ for a user $i$ is:

$$B_i = \sum_{j \neq i \neq k} \frac{\sigma_{jk}(i)}{\sigma_{jk}}$$

where:

  • $\sigma_{jk}$ is the total number of shortest paths between users $j$ and $k$.
  • $\sigma_{jk}(i)$ is the number of shortest paths between $j$ and $k$ that pass through $i$.

Users with high betweenness centrality are critical for information flow and can help bridge gaps between different user groups. In recommender systems, these users can be leveraged to diversify recommendations by connecting disparate communities.

Integrating Social Network Measures into Recommender Systems

Social network measures can enhance traditional collaborative filtering in two primary ways: Enhancing User Similarity or Adjusting User Influence.

Enhancing User Similarity

Replace or augment the similarity function in user-based CF with edge measures (e.g., Jaccard coefficient for tie strength).

Example: The predicted rating $\hat{r}_{ui}$ for user $u$ and item $i$ can be computed as:

$$\hat{r}_{ui} = \bar{r}_u + \frac{\sum_{k \in N_u} \text{sim}(u, k) \cdot (r_{ki} - \bar{r}_k)}{\sum_{k \in N_u} \text{sim}(u, k)}$$

where $\text{sim}(u, k)$ is the network-based similarity (e.g., Jaccard coefficient) between users $u$ and $k$.

Adjusting User Influence

Weight the contributions of neighboring users in CF by their node centrality (e.g., Betweenness or Katz centrality).

Example:

$$\hat{r}_{ui} = \bar{r}_u + \frac{\sum_{k \in N_u} \text{sim}(u, k) \cdot c_k \cdot (r_{ki} - \bar{r}_k)}{\sum_{k \in N_u} \text{sim}(u, k) \cdot c_k}$$

where $c_k$ is the centrality score of user $k$.

Knowledge Graph-Based Recommendations

Knowledge graphs (KGs) integrate semantic relationships between users, items, and attributes into the recommendation process. A Knowledge Graph represents entities (e.g., users, items, genres) and their relationships (e.g., "directed_by," "belongs_to_genre", "stars_in") as a structured network of triples, such as (MovieA, directed_by, DirectorX) or (User1, likes, MovieA).

Knowledge Graph Embeddings

To use KGs in recommender systems, entities and relationships must be represented in a low-dimensional vector space. This process, called knowledge graph embedding, enables efficient similarity computations and integration with other recommendation techniques.

TransE (Translating Embeddings)

TransE (Translating Embeddings) is a foundational method where entities and relationships are represented as vectors such that for a triple (h, r, t):

$$\mathbf{h} + \mathbf{r} \approx \mathbf{t}$$

Here, $\mathbf{h}$, $\mathbf{r}$, and $\mathbf{t}$ are embeddings of the head entity, relation, and tail entity, respectively. The model is trained to minimize the distance between $\mathbf{h} + \mathbf{r}$ and $\mathbf{t}$.

Graph Neural Networks

Graph Neural Networks (GNNs) are another powerful approach. GNNs aggregate information from a node’s neighbors to compute its embedding, capturing both local and global structural information. For example, in a Graph Convolutional Network (GCN), the embedding of a node $v$ in layer $l+1$ is computed as:

$$\mathbf{h}_v^{(l+1)} = \sigma \left( \sum_{u \in N(v)} \frac{1}{\sqrt{|N(v)| |N(u)|}} W^{(l)} \mathbf{h}_u^{(l)} \right)$$

where $N(v)$ is the set of neighbors of $v$, and $W^{(l)}$ is a learnable weight matrix.

Path-Based Recommendation: RippleNet

RippleNet is a prominent path-based recommendation model that propagates user preferences through the KG. The process involves:

  1. Seed Selection: Start with items a user has interacted with (e.g., clicked or purchased).
  2. Ripple Propagation: Explore the KG to find entities $h$ hops away from the seeds, forming ripple sets.
  3. Preference Aggregation: Combine the embeddings of entities in the ripple sets to compute the user’s preference for a candidate item.
  4. Prediction: The probability that a user $u$ will interact with an item $v$ is computed as:

$$\hat{y}_{uv} = \sigma(\mathbf{e}_u^T \mathbf{e}_v)$$

where $\mathbf{e}_u$ and $\mathbf{e}_v$ are the embeddings of the user and item, respectively, and $\sigma$ is the sigmoid function.

RippleNet captures multi-hop relationships in the KG, provides explainable recommendations by highlighting influential paths, and can be combined with collaborative filtering for hybrid recommendations.

Conclusion

Graph-based recommender systems represent a paradigm shift in how we model and leverage relational data for personalized recommendations. By representing users, items, and their attributes as a graph, these systems can capture complex dependencies, mitigate data sparsity, and provide explainable recommendations.

From transitivity-based methods that infer indirect relationships to social network mining techniques that harness user influence, and knowledge graph-enhanced models that integrate semantic information, graph-based approaches offer a versatile and powerful framework for modern recommender systems.