NeuraDock Agent:低通道脑电图智能体的边界感知上下文接地架构
阅读原文· arxiv.orgNeuraDock Agent是一个开源架构,将确定性本地EEG引擎与硬件感知语言层分离。它解析七通道脑电图,执行质量控制与审核后的频谱工作流,生成机器可读结果。大语言模型仅接收经过允许列表筛选的摘要和版本化上下文包,包含硬件描述、工作流、结果字段、实施边界、科学限制及参考案例,原始EEG和密集数组数据保留在本地。评估分三个层面:12份记录在十次数值重复中结果一致;请求捕获与故障注入实验验证了数据边界和本地工件保留;边界意识基准测试对36个普通和对抗性问题在4种上下文消融设置和2个LLM下产生288个输出,证实了硬件与实现感知接地机制的可行性,但未验证临床有效性。
Large language models (LLMs) can make scientific software easier to use. However, a general model does not automatically know which measurements a particular sensor can support, which algorithms are implemented in the current software, or which conclusions are justified by a computed result. These distinctions are especially important for low-channel electroencephalography (EEG), where sparse spatial coverage and variable signal quality make plausible but unsupported interpretations easy to produce. We present NeuraDock Agent, an open-source architecture that separates a deterministic local EEG engine from a hardware-aware language layer. The numerical engine parses recordings, performs quality control, executes reviewed spectral workflows, and writes machine-readable artifacts. The LLM receives only a compact, allowlisted summary and a versioned context pack. The context describes the seven-channel hardware, reviewed workflows, result fields, implementation boundaries, scientific limits, and reference cases. Raw EEG and dense per-sample arrays remain local We evaluate the system at three levels. First, 12 recordings produced identical structured results over ten numerical repetitions, and a complete Rest/Task run produced identical result, report, and figure hashes over three repetitions. Second, request-capture and failure-injection experiments confirmed the tested data boundary and preservation of local artifacts under HTTP, malformed-output, and connection failures. Third, a boundary-awareness benchmark tested 36 ordinary and adversarial questions under four context ablations and two LLMs, yielding 288 outputs.These results support hardware- and implementation-aware grounding as a practical mechanism for calibrating what an EEG agent accepts, qualifies, or refuses; they do not establish clinical validity or a validated absolute cognitive-load index.