· The masked language model task is the key to BERT and RoBERTa. However, they differ in how they prepare such masking. The original RoBERTa article explains it in section 4.1: BERT relies on … · Although BERT preceeded RoBERTa, we may understand this observation to be somewhat applicable to RoBERTa, which is very similar. You may, nonetheless, experiment with the … · 首先,这个系列的模型是以 Qwen3 做 backbone 的,比起 BGE 系列的 XLM-RoBERTa,算是彻底切换到了 LLM。 既然用大模型,就要有 prompt,也就带上了“指令跟随 Instruction Aware ”。