pwshub.com

Debate over “open source AI” term brings new push to formalize definition

A man peers over a glass partition, seeking transparency.

Enlarge / A man peers over a glass partition, seeking transparency.

The Open Source Initiative (OSI) recently unveiled its latest draft definition for "open source AI," aiming to clarify the ambiguous use of the term in the fast-moving field. The move comes as some companies like Meta release trained AI language model weights and code with usage restrictions while using the "open source" label. This has sparked intense debates among free-software advocates about what truly constitutes "open source" in the context of AI.

For instance, Meta's Llama 3 model, while freely available, doesn't meet the traditional open source criteria as defined by the OSI for software because it imposes license restrictions on usage due to company size or what type of content is produced with the model. The AI image generator Flux is another "open" model that is not truly open source. Because of this type of ambiguity, we've typically described AI models that include code or weights with restrictions or lack accompanying training data with alternative terms like "open-weights" or "source-available."

To address the issue formally, the OSI—which is well-known for its advocacy for open software standards—has assembled a group of about 70 participants, including researchers, lawyers, policymakers, and activists. Representatives from major tech companies like Meta, Google, and Amazon also joined the effort. The group's current draft (version 0.0.9) definition of open source AI emphasizes "four fundamental freedoms" reminiscent of those defining free software: giving users of the AI system permission to use it for any purpose without permission, study how it works, modify it for any purpose, and share with or without modifications.

By establishing clear criteria for open source AI, the organization hopes to provide a benchmark against which AI systems can be evaluated. This will likely help developers, researchers, and users make more informed decisions about the AI tools they create, study, or use.

Truly open source AI may also shed light on potential software vulnerabilities of AI systems, since researchers will be able to see how the AI models work behind the scenes. Compare this approach with an opaque system such as OpenAI's ChatGPT, which is more than just a GPT-4o large language model with a fancy interface—it's a proprietary system of interlocking models and filters, and its precise architecture is a closely guarded secret.

OSI's project timeline indicates that a stable version of the "open source AI" definition is expected to be announced in October at the All Things Open 2024 event in Raleigh, North Carolina.

“Permissionless innovation”

In a press release from May, the OSI emphasized the importance of defining what open source AI really means. "AI is different from regular software and forces all stakeholders to review how the Open Source principles apply to this space," said Stefano Maffulli, executive director of the OSI. "OSI believes that everybody deserves to maintain agency and control of the technology. We also recognize that markets flourish when clear definitions promote transparency, collaboration and permissionless innovation."

The organization's most recent draft definition extends beyond just the AI model or its weights, encompassing the entire system and its components.

For an AI system to qualify as open source, it must provide access to what the OSI calls the "preferred form to make modifications." This includes detailed information about the training data, the full source code used for training and running the system, and the model weights and parameters. All these elements must be available under OSI-approved licenses or terms.

Notably, the draft doesn't mandate the release of raw training data. Instead, it requires "data information"—detailed metadata about the training data and methods. This includes information on data sources, selection criteria, preprocessing techniques, and other relevant details that would allow a skilled person to re-create a similar system.

The "data information" approach aims to provide transparency and replicability without necessarily disclosing the actual dataset, ostensibly addressing potential privacy or copyright concerns while sticking to open source principles, though that particular point may be up for further debate.

"The most interesting thing about [the definition] is that they're allowing training data to NOT be released," said independent AI researcher Simon Willison in a brief Ars interview about the OSI's proposal. "It's an eminently pragmatic approach—if they didn't allow that, there would be hardly any capable 'open source' models."

Source: arstechnica.com

Related stories
1 month ago - Computer scientists brainstorm in Pentagon-backed competition to design an AI program that scans open-source code for flaws bad actors could exploit
3 weeks ago - The evolving world of work can make it hard for young professionals in their first jobs. Here are expert tips to help you succeed.
6 hours ago - This is like vi vs Emacs with 'religious overtones,’ project chief laughs Linux is 33 years old. Its creator, Linus Torvalds, still enjoys an argument or two but is baffled why the debate over Rust has attracted so much heat.…
1 day ago - Patently retro — PERA and PREVAIL want to re-enable patents struck down by Supreme Court rulings. ...
1 month ago - Why You Can Trust CNET Our expert, award-winning staff selects the products we cover and rigorously researches and tests our top picks. If you buy...
Other stories
21 minutes ago - Act fast to grab this high-performing mesh router for less than $500, keeping you connected while saving some cash too.
21 minutes ago - If the old-school PlayStation is dear to your heart, you can soon relive those totally sweet 1990s memories. Sony is releasing a series of products...
21 minutes ago - If you've got an old phone to part with, T-Mobile is offering both new and existing customers the brand-new Apple iPhone 16 Pro for free with this trade-in deal.
21 minutes ago - Who doesn't want the best for their beloved pooch? Grab some of these tasty treats to make your dog feel special.
27 minutes ago - To be fair, Joe was probably taking a nap The Iranian cyber snoops who stole files from the Trump campaign, with the intention of leaking those documents, tried to slip the data to the Biden camp — but were apparently ignored, according...