BLEU- evaluation of no model- compares model output to human based references- it is most widely used model for language performance metric but has lot of flaws- author days to use it as model correction tool not model evaluation tool
BLEU- evaluation of no model- compares model output to human based references- it is most widely used model for language performance metric but has lot of flaws- author days to use it as model correction tool not model evaluation tool