Evaluating Long-Context Question and Answer Systems / hacker news