Structured Data Understanding — From LLMs to the Human Brain to a Biologically Inspired Implementation
Structured Data Understanding — From LLMs to the Human Brain to a Biologically Inspired Implementation
1. Problem Background
In risk-control scenarios, structured data (e.g. behavior sequences, social graphs, transaction tables) plays a core role in threat detection. While large language models (LLMs) excel at understanding text and images, they still face limitations in making sense of and reasoning over structured data:
- Difficulty aligning structure with semantics: Graph or temporal structures are hard to map into the language-based semantic space
- Input length and efficiency bottlenecks: Linearizing large-scale structured data easily exceeds the input limits of LLMs
- Weak multi-step reasoning ability: Complex relationships in structured data are difficult for LLMs to handle using chain-of-thought
- Immature fusion of heterogeneous sources: Combining graph, sequence, and tabular data into a unified representation remains an open challenge
Models like GraphGPT, GraphRAG, and StructGPT have begun to explore how to integrate structural information with language, but most still focus on a single data type (either graphs or sequences), leaving multi-source fusion and reasoning mechanisms underdeveloped.
2. Limitations of LLMs for This Problem
- Training data lacks structure-driven tasks: LLMs haven’t been pretrained on graph-specific or sequence-inductive tasks
- Prompt + tool integration is limited: While RAG or tools like StructGPT can assist, they still depend on external structural models
- No native structured reasoning: Chain-of-thought prompting is not well suited for deeply structured contexts; specialized structural reasoning strategies are needed
This suggests exploring a biologically inspired architecture that mimics the brain’s ways of processing structured inputs to enhance LLM capabilities for complex reasoning.
3. How the Human Brain Tackles It: Four Structural Mechanisms
3.1 Brain Rich‑Club Network (Long-range hub organization)
The brain features a network of “rich-club” nodes—highly interconnected hub regions that integrate information across functional zones and support high-level cognition.
3.2 Structured Slots (Sequence Memory + Cognitive Map Integration)
Through prefrontal–hippocampal mechanisms, the brain unifies sequence memory (state-action transitions) and cognitive maps (graph structures). Whittington et al. (2025) propose the structured slots model to explain this integration.
3.3 Episodic Buffer (Working Memory Integration Mechanism)
According to Baddeley’s working memory model, the brain uses an episodic buffer to fuse visual, language, and structural information into a coherent contextual representation.
3.4 Predictive Coding (Error-driven Learning Mechanism)
The brain employs top-down predictions and bottom-up error signals in a loop, iteratively updating its internal model, enabling stable structure perception and semantic fusion.
4. Biologically Inspired Modular Design (Python / PyTorch Implementation)
System Architecture Overview
Graph Module (rich‑club graph)
↓
Slots Module (structured slots)
↓
Episodic Buffer (structure + semantic fusion)
↓
Predictive Coding Layer (prediction + SGD learning)
Each module corresponds to one of the brain-inspired mechanisms and works together to facilitate structured data understanding and semantic integration.
4.1 Graph Module: rich-club structure + GNN implementation
- Build a central dense subgraph (rich-club) plus two sparse subgraphs (each for a subtasks) and interconnect them
- Use NetworkX to generate the graph, and PyTorch Geometric’s
GCNConv
to extract node embeddings - Aggregate embeddings of the central nodes to produce a rich-club representation
import networkx as nx
import random
import torch
from torch_geometric.utils import from_networkx
from torch_geometric.nn import GCNConv
...
4.2 Slots Module: Structured Slots Implementation
- Use a set of trainable slots to simulate prefrontal activity slots
- Apply attention to read relevant slot states and write updates to them
- Train the module so it encodes both sequence memory and cognitive map representations
class SlotsModule(nn.Module):
def __init__(...):
...
def forward(self, key):
weights = softmax(self.read(slots) @ key)
slot_read = weighted sum over slots
new_slot = slot_read + self.write(key)
return slot_read, new_slot
4.3 Episodic Buffer: Multimodal Structure + Semantic Fusion
- Concatenate the rich-club representation from GraphModule with the slot_read representation
- Use a linear projection layer to produce a fused contextual embedding
class EpisodicBuffer(nn.Module):
def __init__(...):
...
def forward(self, graph_repr, slot_repr):
fused = torch.cat([graph_repr, slot_repr], dim=-1)
return torch.relu(self.combine(fused))
4.4 Predictive Coding Layer: Error-driven learning
- Build a layer
predict(state)
to predict the next state - Compute error
state - pred
as the training loss - Use SGD to update parameters and simulate predictive coding dynamics
class PredCodeLayer(nn.Module):
def __init__(...):
...
def forward(self, state):
pred = self.predict(state)
err = state.detach() - pred
loss = err.pow(2).mean()
return pred, loss, err
5. Integrated Example Pseudocode Structure
# Inputs include: graph data, behavior sequence key, expected next slot state, etc.
h, center_repr = graph_module(data)
slot_read, new_slot, att = slots_module(key)
buffer_state = episodic_buffer(center_repr, slot_read)
pred, loss_pc, err = predcode_layer(buffer_state)
loss_slot = ...
loss = loss_pc + loss_slot
loss.backward()
optimizer.step()
This design is intended to simulate rich-club architecture, structured slots, episodic buffer-based integration, and predictive-coding-based learning, all trained with SGD.
6. Future Outlook
- Starting from LLM limitations in structured data understanding, we draw inspiration from four brain mechanisms
- Each module is mapped to a neurally plausible function and implemented in PyTorch
- The integrated system uses predictive coding + SGD to simulate bio-inspired learning and semantic-structure fusion
- It can be extended to downstream risk-control tasks: graph-based account network reasoning, slot-based behavior sequence storage, buffer-level multimodal fusion, and predictive coding for structure-aware reasoning
- If this paradigm shows promising results on specific tasks, it may further motivate bio-inspired research directions