What are open-weights AI models?

Open-weights AI models are models whose weights1 have been published under some type of open license. Prominent examples of this type of model are Meta AI’s LLaMa models and some of Mistral AI’s models. They are typically contrasted to proprietary models with API-only access such as the recent models from OpenAI2 and Anthropic3.

Are open-weights models open source?

There is some debate4 as to what open-weights models (if any) conform to the definition and principles of the Open Source movement and thus whether such models should be called “open-source”.

Open-source software is generally perceived as positive by programmers. Its advantages over proprietary software include:

There are a few ways that open-weights models differ from open-source software. Open-source projects have their source code exposed in a way that allows individual developers to find bugs and modify the project for their own use. Developers can in principle access and modify the weights of open-weights models; however, because these weights are inscrutable, they cannot be inspected for bugs or usefully modified by hand. In that way, open-weights models are more similar to an opaque compiled binary file than to legible source code. Furthermore, open-weights models generally don’t include the training data or procedure used to produce their weights.

On the flip side, open-weights models can be fine-tuned to better serve specific uses or to improve their performance. They also enable independent researchers to more easily conduct safety research and analysis compared to proprietary models. And finally, they are free.

Some models are more open than others

To complicate things further, as with open-source software, some open-weights models are more "open" than others, such as by having different licensing restrictions.For instance, Llama 2 and 3 did not allow distillation, but Llama 3.1 and Mixtral do through their Apache 2.0 release.

Furthermore, they may differ in the openness of their training data and procedure. Earlier models such as BERT used public training datasets, but most modern models use proprietary datasets and the Llama family’s training procedures were made fully public, whereas Mistral AI only released some of the parameters they used for training.

In summary, open-weights models share some open-source virtues like adaptability and free access, while also differing from standard open-source code in their inscrutability, lack of training transparency, and limited editability. Despite these differences, they have been generally well received by the open source community.

Is releasing model weights good for AI safety?

People in the AI safety community have differing intuitions regarding whether releasing model weights is positive or negative for safety.

On the positive side, releasing open-weights models allows:

  • Experimentation to better understand AI models and their properties in general, which might be necessary for some alignment strategies such as mechanistic interpretability

  • Public scrutiny of the models

  • Alignment research such as red-teaming by independent researchers6

  • Testing of the models' safety properties

State-of-the-art open-weights models also weaken the incentive to race in the same way that open content reduces the value of intellectual property.

On the negative side, such models are particularly susceptible to misuse, since once they are released, access to them can never be restricted again. While most frontier open-weights models are released with some fine-tuning that aims to prevent misuse, this fine-tuning is cheap and easy to remove. It could turn out that future models are so capable at a variety of dangerous tasks that they should simply not be released.7 Additionally, publishing open-weights models, and especially fully open-source models with training methodologies, could accelerate AI capabilities advancement by revealing valuable techniques and architectures that competitors can build upon, potentially leading to faster overall AI development.

Further reading:


  1. Strictly speaking, open-weights models include more than the weights; they include all the parameters needed to independently run the model. ↩︎

  2. The ironically-named OpenAI now mostly releases closed-source models, but they have also released some open-weights models such as Whisper. ↩︎

  3. Google DeepMind has released the weights of some models and provides closed source models through API access. ↩︎

  4. The Open Source Initiative (which controls the most commonly accepted standard for the term “Open Source”) has criticized Meta for the use of the term. As of mid-2024, this group is working on a definition of “Open Source AI”. ↩︎

  5. Arguably, this can be a downside since the creators of open-source software might be unpaid. ↩︎

  6. Red-teaming can also be done on closed-source models through API calls, but can be more limited for independent researchers. ↩︎

  7. This also applies to models gated through API access, although the latter is much easier to monitor. ↩︎



AISafety.info

We’re a global team of specialists and volunteers from various backgrounds who want to ensure that the effects of future AI are beneficial rather than catastrophic.