多流派和弦符号建模:冻结的 pop-jazz Music Transformer 的轻量适配能力与边界
阅读原文· arxiv.org研究使用冻结的 pop-jazz Music Transformer 检查点,通过 LoRA、IA3、BitFit、prefix tuning 和 full fine-tuning 五种轻量适配方法,将模型扩展到 blues、bossa nova、Bach chorales 等11个目标流派。在165组实验中,所有方法在保留和弦预测上均优于冻结基模型,宏观增益+2.89至+3.61分;LoRA 和 IA3 得分最高,但统计检验不支持决定性胜者。控制数据大小后 IA3 仍领先,LoRA 的全数据优势消失。诊断表明和弦符号适配可靠改善流派局部和声预测,但单独和弦符号不足以承载完整流派特征。
Harmony is a compact symbolic layer where mathematical pitch relations, acoustic consonance, and musical convention meet. This report treats chord-symbol sequences not as a complete representation of music, but as an interpretable, controllable time series for genre-local harmonic modeling. Starting from a frozen pop-jazz Music Transformer checkpoint, I evaluate how far small adaptation interfaces can extend the model to eleven target genres: blues, bossa nova, Bach chorales, country, electronic, folk, funk, gospel, hip-hop, R&B/soul, and rock. The main evaluation compares LoRA, IA3, BitFit, prefix tuning, and full fine-tuning over 11 genres and 3 seeds, a complete 165-cell grid. All five methods improve over the frozen base on held-out chord prediction, with macro gains from +2.89 to +3.61 points; LoRA and IA3 score highest, but Wilcoxon tests with Holm and Benjamini-Hochberg correction do not support a decisive winner. A matched-data-size control sharpens this: when genres are sub-sampled to a common corpus size, IA3 stays on top but LoRA's full-data edge disappears and it falls to last, indicating the small gaps are partly data-driven. A control-token baseline is also strong, and wrong-genre adapters often beat the frozen base, suggesting much of the effect comes from lightweight conditioning over a reusable harmonic base rather than one particular adapter family. Additional diagnostics (rank sweeps, wrong-genre rotation, a base-checkpoint ablation, chord-only genre classification, generated-output statistics, real-song evaluation, and duplicate analysis) support a bounded conclusion: chord-symbol adaptation reliably improves genre-local harmonic prediction, but chord symbols alone do not carry complete genre identity. The report therefore avoids claims about perceived genre authenticity or full musical quality, which require controlled listener or musician evaluation.