Sounds a bit more like you want to do something reranking-ish. Ideally, you would train a retrieval system to retrieve the most relevant pages which would inturn have been trained on a dataset not very different from MS-Marco. This would get you a small set of documents you want to rerank.
For reranking to be able to detect commercial bias, insincerity or bloat you could use LLMs but IIRC you train a multiclass classifier for each and then combine the probabilities for each head(calibrate too?) into a score and use it in your ranking as weights?
I think Kagi should add a feature where I can subscribe to the domain blocks of someone else. Every time I see a spam blog, I can easily prevent the domain from polluting future results. But it'd be great if I could also use my friends lists to rank their blocked domains to the end of my search results.
For reranking to be able to detect commercial bias, insincerity or bloat you could use LLMs but IIRC you train a multiclass classifier for each and then combine the probabilities for each head(calibrate too?) into a score and use it in your ranking as weights?