Structured Data Understanding — From LLMs to the Human Brain to a Biologically Inspired Implementation

1. Problem Background

In risk-control scenarios, structured data (e.g. behavior sequences, social graphs, transaction tables) plays a core role in threat detection. While large language models (LLMs) excel at understanding text and images, they still face limitations in making sense of and reasoning over structured data:

Difficulty aligning structure with semantics: Graph or temporal structures are hard to map into the language-based semantic space
Input length and efficiency bottlenecks: Linearizing large-scale structured data easily exceeds the input limits of LLMs
Weak multi-step reasoning ability: Complex relationships in structured data are difficult for LLMs to handle using chain-of-thought
Immature fusion of heterogeneous sources: Combining graph, sequence, and tabular data into a unified representation remains an open challenge

Models like GraphGPT, GraphRAG, and StructGPT have begun to explore how to integrate structural information with language, but most still focus on a single data type (either graphs or sequences), leaving multi-source fusion and reasoning mechanisms underdeveloped.

2. Limitations of LLMs for This Problem

Training data lacks structure-driven tasks: LLMs haven’t been pretrained on graph-specific or sequence-inductive tasks
Prompt + tool integration is limited: While RAG or tools like StructGPT can assist, they still depend on external structural models
No native structured reasoning: Chain-of-thought prompting is not well suited for deeply structured contexts; specialized structural reasoning strategies are needed

This suggests exploring a biologically inspired architecture that mimics the brain’s ways of processing structured inputs to enhance LLM capabilities for complex reasoning.

3. How the Human Brain Tackles It: Four Structural Mechanisms

3.1 Brain Rich‑Club Network (Long-range hub organization)

The brain features a network of “rich-club” nodes—highly interconnected hub regions that integrate information across functional zones and support high-level cognition.

3.2 Structured Slots (Sequence Memory + Cognitive Map Integration)

Through prefrontal–hippocampal mechanisms, the brain unifies sequence memory (state-action transitions) and cognitive maps (graph structures). Whittington et al. (2025) propose the structured slots model to explain this integration.

3.3 Episodic Buffer (Working Memory Integration Mechanism)

According to Baddeley’s working memory model, the brain uses an episodic buffer to fuse visual, language, and structural information into a coherent contextual representation.

3.4 Predictive Coding (Error-driven Learning Mechanism)

The brain employs top-down predictions and bottom-up error signals in a loop, iteratively updating its internal model, enabling stable structure perception and semantic fusion.

4. Biologically Inspired Modular Design (Python / PyTorch Implementation)

System Architecture Overview

Graph Module (rich‑club graph)
↓
Slots Module (structured slots)
↓
Episodic Buffer (structure + semantic fusion)
↓
Predictive Coding Layer (prediction + SGD learning)

Each module corresponds to one of the brain-inspired mechanisms and works together to facilitate structured data understanding and semantic integration.

4.1 Graph Module: rich-club structure + GNN implementation

Build a central dense subgraph (rich-club) plus two sparse subgraphs (each for a subtasks) and interconnect them
Use NetworkX to generate the graph, and PyTorch Geometric’s GCNConv to extract node embeddings
Aggregate embeddings of the central nodes to produce a rich-club representation

import networkx as nx
import random
import torch
from torch_geometric.utils import from_networkx
from torch_geometric.nn import GCNConv
...

4.2 Slots Module: Structured Slots Implementation

Use a set of trainable slots to simulate prefrontal activity slots
Apply attention to read relevant slot states and write updates to them
Train the module so it encodes both sequence memory and cognitive map representations

class SlotsModule(nn.Module):
    def __init__(...):
        ...
    def forward(self, key):
        weights = softmax(self.read(slots) @ key)
        slot_read = weighted sum over slots
        new_slot = slot_read + self.write(key)
        return slot_read, new_slot

4.3 Episodic Buffer: Multimodal Structure + Semantic Fusion

Concatenate the rich-club representation from GraphModule with the slot_read representation
Use a linear projection layer to produce a fused contextual embedding

class EpisodicBuffer(nn.Module):
    def __init__(...):
        ...
    def forward(self, graph_repr, slot_repr):
        fused = torch.cat([graph_repr, slot_repr], dim=-1)
        return torch.relu(self.combine(fused))

4.4 Predictive Coding Layer: Error-driven learning

Build a layer predict(state) to predict the next state
Compute error state - pred as the training loss
Use SGD to update parameters and simulate predictive coding dynamics

class PredCodeLayer(nn.Module):
    def __init__(...):
        ...
    def forward(self, state):
        pred = self.predict(state)
        err = state.detach() - pred
        loss = err.pow(2).mean()
        return pred, loss, err

5. Integrated Example Pseudocode Structure

# Inputs include: graph data, behavior sequence key, expected next slot state, etc.

h, center_repr = graph_module(data)
slot_read, new_slot, att = slots_module(key)
buffer_state = episodic_buffer(center_repr, slot_read)
pred, loss_pc, err = predcode_layer(buffer_state)
loss_slot = ...
loss = loss_pc + loss_slot
loss.backward()
optimizer.step()

This design is intended to simulate rich-club architecture, structured slots, episodic buffer-based integration, and predictive-coding-based learning, all trained with SGD.

6. Future Outlook

Starting from LLM limitations in structured data understanding, we draw inspiration from four brain mechanisms
Each module is mapped to a neurally plausible function and implemented in PyTorch
The integrated system uses predictive coding + SGD to simulate bio-inspired learning and semantic-structure fusion
It can be extended to downstream risk-control tasks: graph-based account network reasoning, slot-based behavior sequence storage, buffer-level multimodal fusion, and predictive coding for structure-aware reasoning
If this paradigm shows promising results on specific tasks, it may further motivate bio-inspired research directions