Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In general this is just copying work done by Anthropic, so there's nothing fundamentally new here.

What they have done here is to identify patterns internal to GPT-4 that correspond to specific identifiable concepts. The work was done my OpenAI's mostly dismantled safety team (it has the names of this teams recently departed co-leads Ilya & Jan Leike on it), so this is nominally being done for safety reasons to be able to boost or suppress specific concepts from being activated when the model is running, such as Anthropic's demonstration of boosting their models fixation on the Golden Gate bridge:

https://www.anthropic.com/news/golden-gate-claude

This kind of work would also seem to have potential functional uses as well as safety ones, given that it allow you to control the model in specific ways.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: