Glossary

Key observations:

RoBERTa achieves the highest accuracy, consistent with its improved training recipe. - DistilBERT is only 0.7 percentage points behind BERT with 40% fewer parameters. - ALBERT has the lowest accuracy, likely because its shared parameters limit capacity despite having 12 layers. - T5-Small performs c

Learn More

Related Terms