By Courtenay Ngo, Microsoft
Introduction: The Case for AI Supply Chain Security
The rapid growth of Artificial Intelligence (AI) and Machine Learning (ML) adoption has led to AI/ML integration into nearly everything from search engines to enterprise applications. AI/ML models bring with them inherited risks of traditional software vulnerabilities such as insecure code libraries and dependencies and novel unique risks specific to how AI/ML models are built, trained, and deployed. This expanded risk landscape has introduced the emergence of the AI Supply Chain (AI SC).
So, what is the AI Supply Chain?
The AI SC includes all components, data, tools, and infrastructure required for developing, training (include validation and testing), and running AI/ML models. In addition, an AI System is complex and can include a variety of capabilities like Large Language Models and generative AI applications, ML capabilities, and features, agents and assistants.
The components, either for models or systems, would include IDEs (Integrated Development Environments) like VS Code, custom scripts and notebooks, training datasets, AI/ML libraries and frameworks (e.g., TensorFlow, PyTorch, Azure OpenAI API, LangChain, Ollama), any pre-trained model weights, source code, and the hardware and training environments (such as Microsoft Azure AI Foundry). Like software components, AI components can be either created or obtained from open-source ecosystem making their origin difficult to trace.
When components are untracked or poorly documented, they can pose hidden risks, causing security breaches or model failures. For instance, poisoned or tampered model weight data may result in incorrect or undesired output such as model hallucinations or malicious outcomes, and outdated dependencies may introduce exploitable vulnerabilities which can lead to sensitive data leakage. A lack of transparency within the AI SC underscores the important need for visibility. By clearly documenting how models are developed, organizations can better evaluate risks and security of AI/ML models and systems. Adopting a tool such as the AI Software Bill of Materials (AI-SBOM) can provide visibility into the AI SC components, making it easier to trace and audit all components, processes, and potential risks.
What is an AI-SBOM and Why is it Different from an SBOM?
Over the last decade, there have been high-profile software supply chain attacks (e.g., Solar Winds) and critical vulnerabilities (e.g., Log4J) that have triggered a push for greater software transparency and traceability. In response, both the U.S. and the EU have responded by publishing, OMB Memo M-26-05 Adopting a Risk-based Approach to Software and Hardware Security[1], and EU Cyber Resiliency Act (CRA)[2], requiring organizations to track software components using Software Bill of Materials (SBOMs). Among other things, SBOM files document software dependencies within software applications, enhancing visibility into both the general software development practices used and the specific software components built into these applications. Integrating SBOM data with resources such as vulnerability databases and security advisories can drive actionable insights and allow for quicker identification of risks[3]. However, SBOMs generated for an AI system may not contain information about AI and dataset components like training data, pre-trained models, model weights, or configurations. This could lead to AI-specific risks going unnoticed. As a result, SBOMs alone are insufficient to manage the growing risks associated with building, deploying, and maintaining AI systems.
Recognizing these gaps in coverage, government agencies, and private sector organizations have created frameworks like MITRE ATLAS, NIST’s Secure Software Development Practices (SSDF) for Generative AI and Dual-Use Foundation Models that call for tamper-evident documentation of model development processes.[4],[5],[6] AI-SBOMs have emerged as a complementary solution that provides, like SBOMs, structured machine-readable records to track AI model components, software libraries and dependencies. When signed and versioned, an AI-SBOM becomes a tamper-proof artifact which provides traceability and a verifiable attestation of the integrity of AI model inputs and processes. This traceability includes components that are hidden behind an API or abstracted by cloud service providers that are otherwise undocumented and unverifiable.
Why We Need AI-SBOMs
Whether it is to build trust in the AI model we consume or required for regulatory compliance, AI-SBOMs are multi-purpose and span multiple use cases. However, going back to AI SC risks, the most routine use cases are risk management and incident response. Security leaders increasingly emphasize that AI cannot be treated like “just another tool in the stack”. If stakeholders lose confidence in how AI is built, trained, or deployed, the resulting loss can outweigh operational gains. Protecting trust requires protecting the infrastructure and visibility into the AI SC (data pipeline, model artifacts, AI frameworks) will help determine whether a model can be relied upon[7].
Organizations such as Microsoft Research, Microsoft AI Red Team (AIRT), MITRE ATLAS, and OWASP have researched and documented a range of security threats unique to AI models. These threats range from compromised dependency chains in AI frameworks to impersonation of model publishers, to backdoor attack on AI models, to prompt injections leading to AI model compromise and expose a systematic issue of lack of visibility into AI models and how the components are developed.[8],[9] The value proposition of an AI-SBOM is that it provides visibility and the means to verify these components.
Continuing to build on the concept of a traditional SBOM, an AI-SBOM captures AI and Dataset specific metadata[10], including:
- Dataset provenance and lineage
- Model identity, architecture, hyperparameters, license
- AI libraries and frameworks used for training and preprocessing data
- Security artifacts like hashes and checksums
These metadata elements are essential for detecting file tampering and software supply chain vulnerabilities. By extension, AI-SBOMs provide a forensic record that can support audits, incident response, and regulatory reporting. AI-SBOMs will become a foundational requirement as governments and standards bodies continue to define expectations for AI assurance.
When and Where in the ML pipeline lifecycle to Generate and Scan AI-SBOMs?
AI/ML model development, like software development’s SDLC, has its own lifecycle covering design, deployment, monitoring, and ongoing improvement. Each phase of the model development lifecycle introduces artifacts (datasets, model weights, code, configurations) that affect model behavior, quality, and operation. Since each of these artifacts evolves across the AI/ML lifecycle, AI-SBOM generation cannot be treated as a single event. Instead, it should occur at multiple points to maintain continuous visibility into a model lineage, dependencies, and security posture.
The Joint Guidance a Shared Vision of Software Bill of Materials, published by CISA and endorsed by a large number of government agencies worldwide, affirms that SBOMs can and should be generated at multiple points in the software lifecycle. It advises using automated scanning tools to detect components in development, and applying automated analysis methods for existing software, whether source or binary.[11] This guidance is applicable to generating AI-SBOMs at various points throughout the AI/ML model development lifecycle. This practice ensures complete visibility into model lineage, dependencies, and security posture.
In the context of SBOMs, CISA recommends generating different types of SBOMs as each type provides specific information about the design, source, build, and deployment.[12] Each type of SBOM, from design to run-time, has benefits and limitations, can reflect different sources of truth depending on when and how it is generated, and is use case dependent. It is important to emphasize that SBOM generation is not a one-time event and can occur as early as the design stage as well as at different points throughout the SDLC.[13] The NTIA Minimum Elements for SBOMs recommend updating SBOMs whenever a component changes.[14]
AI-SBOM generation can follow this same guidance throughout the AI/ML lifecycle with application at the following key stages:
AI/ML model lifecycle stages
- Data preprocessing: After data preprocessing, capture information about the training data (e.g., version, data licenses, acquisition source). This will enable traceability of the training data.
- Model Training: At training/fine tuning stage capture more “Build” artifacts about the model’s architecture (e.g., model weights, hyperparameters, library versions). This will support reproducibility of the model and security scanning.
- Model Deployment: At deployment when the model is released into production generate a “Deployed” AI-SBOM. This will capture the pipeline details, configuration files, and final model files, hashed and signed to provide a tamper-proof record. The deployed AI-SBOM should be validated. In scenarios where organizations may not have access to the original training or build pipelines (e.g., acquiring open-source or third-party models), it is still possible to generate an “Analyzed” AI-SBOM through analysis of model artifacts such as model cards and README, code, and configuration files to detect and parse metadata such as libraries, configurations, and architecture details from model repos like Hugging Face and Azure AI Foundry.
- Retraining and Repurposing: When there is an update to the model, datasets or its dependencies.
AI models can evolve for myriad reasons like through transfer learning, fine-tuning, and dataset modifications, making it difficult to determine provenance and assess change over time. Generating AI-SBOMs throughout the AI model lifecycle will record these changes, enabling organizations to trace the origin, ownership, and integrity of AI models and their components over time.
Challenges in Operationalizing AI-SBOMs?
AI-SBOMs offer many key benefits for strengthening trust in the AI SC and managing risk. This paper highlights these benefits of generating an AI-SBOM. However, there are challenges in operationalizing AI-SBOMs. These challenges include the immaturity of automated tooling for capturing comprehensive metadata across all AI/ML lifecycle stages, the inaccessibility of critical model and data information due to inconsistent data governance or proprietary constraints, and the difficulty of determining what metadata is most relevant for various organizational use cases. As organizations strive to create AI-SBOMs that support transparency, reproducibility, and compliance, they must also navigate complex scenarios such as restricted access to closed-source model details, data privacy concerns, and the need for adaptable frameworks that can adjust to evolving security, regulatory, and operational requirements.
Observed challenges in automating the generation of AI-SBOM.
- Tooling for automating and generating AI-SBOM. Tools for generating AI-SBOM to capture software libraries, datasets, model weights, configurations, and other artifacts are underdeveloped. The absence of automated tooling to extract this metadata from the training pipelines and model artifacts limits adoption.
- Incomplete and missing information.
- AI/ML model cards are now generally available for most models hosted on platforms like Hugging Face and Azure AI Foundry. AI Model cards have some shortcomings, such as lack of standardization of information published about the model and missing software package details, which are essential for effective vulnerability management.
- AI/ML model components may not be discoverable or easily retrieved due to incompatible or inconsistent data sources, leading to incomplete and inaccurate AI-SBOMs. Data governance and cataloguing of metadata practices can remediate this issue and ensure the metadata information is retrievable by automated tooling. For example, when an organization lacks sufficient meta data sources, post-generation processes can be introduced to review and approve generated AI-SBOMs. This process, supported by internal researchers or analysts, can check for completeness and identify areas needing manual enrichment. These workflows would correct or populate unknown values in critical AI-SBOM fields enough to get them to pass a completeness / quality bar for the desired use-cases.
- AI/ML models can be created from curated datasets that are sensitive (e.g., patient data, customer behavior, proprietary algorithms). As a result, organizations may intentionally withhold this data from external or unauthorized parties to protect Intellectual Property (IP) or privacy concerns.
- As a mitigation, methods such as redacting private components [Framing Software Component Transparency] using encryption and signed attestations can balance transparency with the risk of disclosing private data.
- Closed-source AI/ML model details (e.g., where information about the training data, architecture, software libraries and dependencies are proprietary) may be unavailable to external organizations. In this scenario, AI-SBOMs may have only generic information such as vendor name, model version, and file hash, rather than the model’s full lineage.
- Determining and Aligning AI-SBOM information with Use Cases. Another challenge in operationalizing AI-SBOMs is knowing what information to include. AI-SBOMs are not one-size-fits-all. For the AI-SBOM to be useful, organizations must decide on the information to include in an AI-SBOM, which is highly use case dependent. What this means is that an AI-SBOM produced for one use case (e.g., Regulatory Compliance) may not be meaningful or actionable for another (e.g., AI security assurance). Therefore, different use cases will require different metadata. Below are a few concrete examples to illustrate this challenge:
- An organization whose focus is on reproducibility may prioritize capturing model architecture, data preparation and preprocessing steps, and dataset versions.
- An organization preparing for regulatory compliance may need to include AI/ML pipeline environment, licensing, data usage rights, or third-party dependencies.
- An organization concerned with security and vulnerabilities may require critical artifact file hashes, AI/ML framework and software libraries/dependency versions.
In such cases, misalignment between use cases and AI-SBOM content can lead to gaps in meeting the expectations of the consumer of the AI-SBOM.
Current Work
There are ongoing efforts to enhance AI supply chain transparency and security by defining AI-SBOM manifest standards. The Linux Foundation SPDX and OWASP CycloneDX initiatives have both extended existing SBOM specifications to support the inclusion of AI and Datasets details. In addition, the OWASP GenAI project is focused on developing open-source tooling for automated AI-SBOM generation.
Conclusion
As AI/ML models become tightly integrated into traditional software supply chains, they bring more risk. This new landscape highlights the rise of the AI Supply Chain and the importance of understanding and tracking AI components. While SBOMs help secure the traditional software supply chain, they fall short in enabling visibility into AI components. AI-SBOMs extends the purview of SBOMs to include AI-specific elements such as training datasets, model weights, and AI libraries and frameworks, providing traceability, transparency, and trust in AI/ML models. However, operationalizing AI-SBOMs comes with its own challenges. Gaps in tooling, missing or inaccurate information, determining pertinent data to collect and misaligned use cases can all hinder broad adoption and effective application.
To effectively leverage AI-SBOMs, organizations must adopt a lifecycle-based approach, clearly define use cases, and establish data governance and cataloging to appropriately generate meaningful AI-SBOMs. In addition, organizations should be prepared to share them across ecosystems that may have different requirements and expectations. By doing so, they will not only strengthen their own security posture but also contribute to a more secure and transparent AI ecosystem. The path toward adoption will require continuing collaboration across government, industry, and research to refine standards, tooling, and practices that will make AI-SBOMs meaningful and actionable contributors to a more resilient AI Supply Chain.
References:
- Cybersecurity and Infrastructure Security Agency. Framing Software Component Transparency (2024), https://www.cisa.gov/resources-tools/resources/framing-software-component-transparency-2024.
- Cybersecurity and Infrastructure Security Agency (CISA). Joint Guidance: A Shared Vision of Software Bill of Materials for Cybersecurity. September 2025. https://www.cisa.gov/sites/default/files/2025-09/joint-guidance-a-shared-vision-of-software-bill-of-materials-for-cybersecurity_508c.pdf.
- Cybersecurity and Infrastructure Security Agency. Types of Software Bill of Material (SBOM) Documents. Accessed October 2025. https://www.cisa.gov/sites/default/files/2023-04/sbom-types-document-508c.pdf.
- European Union EUR-LEX (2024), Regulation – 2024/2847 – EN – EUR-Lex
- Federal Register Executive Order 14028. Improving the Nation’s Cybersecurity Section 4. May 2021. Federal Register :: Improving the Nation’s Cybersecurity
- “Securing the Future of AI/ML at Microsoft.” March 2025. https://learn.microsoft.com/en-us/security/ai-ml/securing-the-future-of-ai-ml-at-microsoft.
- ATLAS: Adversarial Threat Landscape for Artificial-Intelligence Systems. Accessed October 2025. https://atlas.mitre.org/techniques/AML.T0010.
- National Institute of Standards and Technology (NIST). Secure Software Development Practices for Generative AI and Dual-Use Foundation Models: An SSDF Community Profile (SP 800-218A). May 2025. https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-218A.pdf.
- National Telecommunications and Information Administration (NTIA). The Minimum Elements for a Software Bill of Materials (SBOM). July 12, 2021. https://www.ntia.gov/report/2021/minimum-elements-software-bill-materials-sbom.
- Oracle A-Team Contributors. Securing AI: A CISO’s Perspective on Trust and Resilience. September 5, 2025. https://www.ateam-oracle.com/securing-ai-a-cisos-perspective-on-trust-and-resilience
- The System Package Data Exchange (SPDX) Specification Version 3.0.1. Accessed October 2025. https://spdx.github.io/spdx-spec/v3.0.1/https://spdx.github.io/spdx-spec/v3.0.1/
- https://www.whitehouse.gov/wp-content/uploads/2026/01/M-26-05-Adopting-a-Risk-based-Approach-to-Software-and-Hardware-Security.pdf.
- Secure AI/ML Model Ops Cheat Sheet. Accessed October 2025. https://cheatsheetseries.owasp.org/cheatsheets/Secure_AI_Model_Ops_Cheat_Sheet.html.
[1] https://www.whitehouse.gov/wp-content/uploads/2026/01/M-26-05-Adopting-a-Risk-based-Approach-to-Software-and-Hardware-Security.pdf.
[2] European Union EUR-Lex (2024): https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32024R2847&qid=1762357749093
[3] Cybersecurity and Infrastructure Security Agency, Joint Guidance: A Shared Vision of Software Bill of Materials for Cybersecurity (2025), https://www.cisa.gov/sites/default/files/2025-09/joint-guidance-a-shared-vision-of-software-bill-of-materials-for-cybersecurity_508c.pdf
[4] MITRE, ATLAS: Adversarial Threat Landscape for Artificial-Intelligence Systems, https://atlas.mitre.org/.
[5] National Institute of Standards and Technology (NIST). Secure Software Development Practices for Generative AI and Dual-Use Foundation Models: An SSDF Community Profile (SP 800-218A), May 2025, https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-218A.pdf.
[6] Microsoft. “Securing the Future of AI/ML at Microsoft,” March 2025, https://learn.microsoft.com/en-us/security/ai-ml/securing-the-future-of-ai-ml-at-microsoft.
[7] Oracle A-Team Contributors. Securing AI: A CISO’s Perspective on Trust and Resilience. September 5, 2025. https://www.ateam-oracle.com/securing-ai-a-cisos-perspective-on-trust-and-resilience
[8] MITRE, ATLAS: Adversarial Threat Landscape for Artificial-Intelligence Systems.
[9] OWASP. Secure AI/ML Model Ops Cheat Sheet, https://cheatsheetseries.owasp.org/cheatsheets/Secure_AI_Model_Ops_Cheat_Sheet.html.
[10] The System Package Data Exchange (SPDX) Specification Version 3.0.1, https://spdx.github.io/spdx-spec/v3.0.1/
[11] Cybersecurity and Infrastructure Security Agency, Joint Guidance.
[12] Cybersecurity and Infrastructure Security Agency. Framing Software Component Transparency (2024), https://www.cisa.gov/resources-tools/resources/framing-software-component-transparency-2024.
[13] Cybersecurity and Infrastructure Security Agency. Types of Software Bill of Material (SBOM) Documents, https://www.cisa.gov/sites/default/files/2023-04/sbom-types-document-508c.pdf.
[14] The Minimum Elements for a Software Bill of Materials (SBOMS): Pursuant to Executive Order 14028 on Improving the Nation’s Cybersecurity, https://www.ntia.gov/sites/default/files/publications/sbom_minimum_elements_report_0.pdf