Evaluating Long-Context Question and Answer Systems

(eugeneyan.com)

15 points | by swyx 4 days ago

1 comments

  • rooftopzen 1 day ago
    Seems AI generated, if not, nothing new here. Post regurgitates info known for long time and misses largest issues of nuance of “LLM-as-a-judge” as if written in 2023 and audience is living under rock (why?):

    >> This is where LLM-evaluators (also called “LLM-as-Judge”) can help