Skip to main content

Open Source Coalition Announces 'Model-Signing' with Sigstore to Strengthen the ML Supply Chain

3 months 1 week ago
The advent of LLMs and machine learning-based applications "opened the door to a new wave of security threats," argues Google's security blog. (Including model and data poisoning, prompt injection, prompt leaking and prompt evasion.) So as part of the Linux Foundation's nonprofit Open Source Security Foundation, and in partnership with NVIDIA and HiddenLayer, Google's Open Source Security Team on Friday announced the first stable model-signing library (hosted at PyPI.org), with digital signatures letting users verify that the model used by their application "is exactly the model that was created by the developers," according to a post on Google's security blog. [S]ince models are an uninspectable collection of weights (sometimes also with arbitrary code), an attacker can tamper with them and achieve significant impact to those using the models. Users, developers, and practitioners need to examine an important question during their risk assessment process: "can I trust this model?" Since its launch, Google's Secure AI Framework (SAIF) has created guidance and technical solutions for creating AI applications that users can trust. A first step in achieving trust in the model is to permit users to verify its integrity and provenance, to prevent tampering across all processes from training to usage, via cryptographic signing... [T]he signature would have to be verified when the model gets uploaded to a model hub, when the model gets selected to be deployed into an application (embedded or via remote APIs) and when the model is used as an intermediary during another training run. Assuming the training infrastructure is trustworthy and not compromised, this approach guarantees that each model user can trust the model... The average developer, however, would not want to manage keys and rotate them on compromise. These challenges are addressed by using Sigstore, a collection of tools and services that make code signing secure and easy. By binding an OpenID Connect token to a workload or developer identity, Sigstore alleviates the need to manage or rotate long-lived secrets. Furthermore, signing is made transparent so signatures over malicious artifacts could be audited in a public transparency log, by anyone. This ensures that split-view attacks are not possible, so any user would get the exact same model. These features are why we recommend Sigstore's signing mechanism as the default approach for signing ML models. Today the OSS community is releasing the v1.0 stable version of our model signing library as a Python package supporting Sigstore and traditional signing methods. This model signing library is specialized to handle the sheer scale of ML models (which are usually much larger than traditional software components), and handles signing models represented as a directory tree. The package provides CLI utilities so that users can sign and verify model signatures for individual models. The package can also be used as a library which we plan to incorporate directly into model hub upload flows as well as into ML frameworks. "We can view model signing as establishing the foundation of trust in the ML ecosystem..." the post concludes (adding "We envision extending this approach to also include datasets and other ML-related artifacts.") Then, we plan to build on top of signatures, towards fully tamper-proof metadata records, that can be read by both humans and machines. This has the potential to automate a significant fraction of the work needed to perform incident response in case of a compromise in the ML world... To shape the future of building tamper-proof ML, join the Coalition for Secure AI, where we are planning to work on building the entire trust ecosystem together with the open source community. In collaboration with multiple industry partners, we are starting up a special interest group under CoSAI for defining the future of ML signing and including tamper-proof ML metadata, such as model cards and evaluation results.

Read more of this story at Slashdot.

EditorDavid

Python's PyPI Finally Gets Closer to Adding 'Organization Accounts' and SBOMs

3 months 1 week ago
Back in 2023 Python's infrastructure director called it "the first step in our plan to build financial support and long-term sustainability of PyPI" while giving users "one of our most requested features: organization accounts." (That is, "self-managed teams with their own exclusive branded web addresses" to make their massive Python Package Index repository "easier to use for large community projects, organizations, or companies who manage multiple sub-teams and multiple packages.") Nearly two years later, they've announced that they're "making progress" on its rollout... Over the last month, we have taken some more baby steps to onboard new Organizations, welcoming 61 new Community Organizations and our first 18 Company Organizations. We're still working to improve the review and approval process and hope to improve our processing speed over time. To date, we have 3,562 Community and 6,424 Company Organization requests to process in our backlog. They've also onboarded a PyPI Support Specialist to provide "critical bandwidth to review the backlog of requests" and "free up staff engineering time to develop features to assist in that review." (And "we were finally able to finalize our Terms of Service document for PyPI," build the tooling necessary to notify users, and initiate the Terms of Service rollout. [Since launching 20 years ago PyPi's terms of service have only been updated twice.] In other news the security developer-in-residence at the Python Software Foundation has been continuing work on a Software Bill-of-Materials (SBOM) as described in Python Enhancement Proposal #770. The feature "would designate a specific directory inside of Python package metadata (".dist-info/sboms") as a directory where build backends and other tools can store SBOM documents that describe components within the package beyond the top-level component." The goal of this project is to make bundled dependencies measurable by software analysis tools like vulnerability scanning, license compliance, and static analysis tools. Bundled dependencies are common for scientific computing and AI packages, but also generally in packages that use multiple programming languages like C, C++, Rust, and JavaScript. The PEP has been moved to Provisional Status, meaning the PEP sponsor is doing a final review before tools can begin implementing the PEP ahead of its final acceptance into changing Python packaging standards. Seth has begun implementing code that tools can use when adopting the PEP, such as a project which abstracts different Linux system package managers functionality to reverse a file path into the providing package metadata. Security developer-in-residence Seth Larson will be speaking about this project at PyCon US 2025 in Pittsburgh, PA in a talk titled "Phantom Dependencies: is your requirements.txt haunted?" Meanwhile InfoWorld reports that newly approved Python Enhancement Proposal 751 will also give Python a standard lock file format.

Read more of this story at Slashdot.

EditorDavid