Great post Nathan, I always like to read your take as super interesting.

I think open source movement is great and required, but we need to see the potential risks of releasing these models. I would prefer a large open source foundation taking charge of training data licensing and model training. Then, this would be licensed to multiple commercial players. I think some current players in training “open source” models - represent their specific interests. Thus the current open source models have lacked (a) quality like BLOOM, Galactica (b) commercial licensing terms (OPT, LLaMA) (c) risk/threat management: are any open source models producing data regards their misuse at scale?

For example practically nobody talks about the needs to have alternatives as well for OpenAI’s Moderation endpoint, which I think will be included to most commercial products released to end-clients by the enterprises.

If they attach a crypto currency for micropayments on prompts would be profitable

Hi Nathan, Your articles helped me a lot in understanding recent trends. Thank you for that!.

Have a basic question: How do models like OpenAI Codex achieve syntactically correct code every time they generate code? I can understand that they can generate code but I am not able to understand how it can generate correct syntax everytime

