Open Access and Open Source in the Era of Large Language Models: The Real ‘Open’ Debate

Xianbo QIAN
2 min readDec 26, 2023

Traditionally, Open Access refers to providing the public with unrestricted access. It’s like a key unlocking the door to a treasure trove of knowledge. For instance, under licenses like Creative Commons, users are free to distribute and modify works. Platforms like Hugging Face bring together the world’s most advanced large language models (LLMs), brimming with untapped potential akin to hidden market treasures, waiting to be discovered and utilized. Here, exploration and experimentation are free, without any cost. The essence of Open Access is in making these powerful tools accessible to everyone, whether they are educators, researchers, or enthusiasts.

Conversely, Open Source signifies the public availability of source code, implying a spirit of free collaboration. Linux, for example, has built a vast developer ecosystem through its open-source ethos. In such an environment, the source code of models is no longer a closely guarded secret but a public asset. The inner workings of these codes are clearly displayed, motivating every aspiring individual to delve deeper, improve, or even reshape them. Open Source is not just a means to access knowledge; it is a culture that fosters participation and progress.

Open Weights ≠ True Open Source:

In the age of LLMs, the distinction between Open Access and Open Source blurs. The weights of an LLM can be considered part of the program, and many think that if weights are open, the model is open source. However, a model that only opens its weights is not fully open source. Models that only reveal their weights might still keep their code, training data, and technical details secret. This limits public involvement and the ability to improve the models, failing to capture the spirit of open source. Without broader openness, fine-tuning or retraining models becomes a significant challenge, and building a community around these models is difficult.

Purists in open source advocate that this is open access, not genuine open source. From a user perspective, it’s essential to differentiate between mere weight openness and complete open sourcing (including training codes, data, technical papers, etc.).

We look forward to more open-source initiatives, but the pace of open sourcing should also consider the specific goals of organizations. Regardless of the approach, we hope to see more exemplary outcomes that enhance community prosperity. Open Access may not sound as grand, but it deserves appreciation too. The real value of technology lies in how we fully utilize these open and open-source technologies to collaboratively create a more beautiful world.

Written by: Gemini + ChatGPT4 + me

Image by: Dalle

--

--