The vulnerabilities of large language models (LLMs) make them targets of malicious attacks beyond those of traditional systems. In the next sections of this post, I overview the new AI security guidelines published jointly by the US and UK in late November 2023. In the last section, I discuss four general issues that are important for traditional computer systems, but they are crucial for AI systems that include machine learning.
The "Guidelines for Secure AI System Development" are not law. However, they are a precursor of what may be coming for artificial intelligence security, especially to secure LLMs. The new AI guidelines are like guardrails that prevent disasters by keeping people on the path towards increased AI security and decreased AI security risks.
Definitions and Scope
It is important to recognize the use of “AI" in the guidelines document defines a larger scope than the authors intended. The guidelines are focused on generative AI security risks caused by LLM security risks. The guidelines apply to anyone designing, creating, delivering, and/or using machine-learning (ML) systems.
Although the term "AI" is used throughout the guidelines document, AI does NOT have its normal meaning. The document states, "we use 'AI' to refer specifically to machine-learning (ML) applications. All types of ML are in scope." The referenced endnote “” clarifies an exclusion: "As opposed to non-ML AI approaches such as rule-based systems."
AI model security (LLM security) is a new area of cybersecurity. Because the document focuses on generative AI security, it lists two identifiable features that define an ML application. An ML application must:
- "involve software components (models) that allow computers to recognize and bring context to patterns in data without the rules having to be explicitly programmed by a human
- generate predictions, recommendations, or decisions based on statistical reasoning”
The components of an ML system include its "hardware, software, workflows, and supply chains”. All of these can contribute to LLM vulnerabilities. The term “adversarial machine learning (AML)” applies to any exploitation of ML AI vulnerabilities. Examples include modifying the model’s performance, doing unauthorized actions, extracting sensitive information from the model, and data poisoning by deliberately causing corruption.
Summarizing the Secure AI Guidelines
The guidelines document has four sections: the executive summary, the introduction, the guidelines, and suggestions for further reading. This section follows that same outline. The guidelines document is a TLP:CLEAR document, so there are no limits on the disclosure of the information it contains.
The guidelines specify that they are recommended "for providers of any systems that use artificial intelligence (AI)." The issue is AI and security of all kinds, not just AI data security. How the system came into existence is irrelevant–it can have been built from nothing or built on top of other systems, services, or tools. The AI may be hosted by an organization or use an external application programming interface (API).
The referenced endnote “” defines who is a provider: "Here defined as a person, public authority, agency or other body that develops an AI system (or that has an AI system developed) and places that system on the market or puts it into service under its name or trademark."
However, the main text of the document explicitly places the responsibility for a secure AI on each organization that provides a component of an end-user's system. It states in bold font, "providers of AI components should take responsibility for the security outcomes of users further down the supply chain.”
A user can be an end user or a provider that incorporates a component into the ML used by the end user. Example components are software, data, models, and remote services.
Models, pipelines, and/or systems should have security controls and mitigation processes. The most secure setting is to be the default.
Implementation of the guidelines is intended to achieve three goals for AI systems.
- They are to "function as intended."
- They are to be "available when needed."
- They are to "work without revealing sensitive data to unauthorized parties."
The guidelines apply to decisions made during design, development, deployment, and operation and maintenance. The document advises that any stakeholder in these four key parts of the software development lifecycle read the guidelines.
The guidelines for trustworthy AI address both standard threats and those that are unique to AI systems.
- Design: Specific topics and tradeoffs, and the guidelines emphasize understanding risks and threat modeling
- Development: Management of asset and technical debt, security of supply chains, and documentation
- Deployment: Infrastructure and model protection, processes for incident management, and responsible release
- Operation and maintenance: Issues arising after deployment, such as monitoring, logging, managing updates, and sharing information.
The approach is to be "secure by default,” which aligns with the practices defined in related documents:
- NCSC’s Secure development and deployment guidance
- NIST’s Secure Software Development Framework
- Secure by design principles on cisa.gov
Introduction of the Guidelines
The Introduction answers three questions:
- Why is AI security different?
- Who should read this document?
- Who is responsible for developing secure AI?
The emphasis is on "secure design principles" that prioritize three types of actions:
- "taking ownership of security outcomes for customers
- embracing radical transparency and accountability
- building organizational structure and leadership so secure by Design is a top business priority"
This requires investing significant resources to prioritize features, mechanisms, and implementation of tools. This is meant to safeguard customers and their data so that costly redesigns are not needed later, and it applies across the entire supply chain for an AI system.
The last section of the Introduction answers the question, "Who is responsible for developing secure AI?" The supply chains that provide users with ML systems are getting very complex. Because the next paragraphs are so important, they are fully quoted from the document.
"Where risks cannot be mitigated, the provider should be responsible for:
- informing users further down the supply chain of the risks that they (and if applicable) their own users are accepting
- advising them on how to use the component securely
Where system compromise could lead to tangible or widespread physical or reputational damage, significant loss of business operations, leakage of sensitive or confidential information and/or legal implications, AI cybersecurity risks should be treated as critical.”
Guidelines for Secure AI System Development
The four areas of AI security concerns are Design (4), Development (4), Deployment (5), and Operation and maintenance (4). The number in parentheses indicates the number of guidelines in a section. Because the text of the guidelines is very detailed, this post includes only the guideline headers. It can be used as a checklist if responsibility for different guidelines is assigned to different staff members within a provider organization.
1. Secure Design
- Raise staff awareness of threats and risks
- Model the threats to your system
- Design your system for security as well as functionality and performance
- Consider security benefits and trade-offs when selecting your AI model
2. Secure Development
- Secure your supply chain
- Identify, track, and protect your assets
- Document your data, models, and prompts
- Manage your technical debt
3. Secure Deployment
- Secure your infrastructure
- Protect your model continuously
- Develop incident management procedures
- Release AI responsibly
- Make it easy for users to do the right things
4. Secure Operation and Maintenance
- Monitor your system's behavior
- Monitor your system's inputs
- Follow a secure-by-design approach to updates
- Collect and share lessons learned
This section of the generative AI guidelines document provides an annotated list of suggested readings in three areas: AI development, Cybersecurity, and Risk management.
The 12 endnotes provide definitions and links to additional information about AI and security issues. The intent is to reduce AI security vulnerabilities as much as possible.
Protection and Mitigation
Addressing the risks in AI LLM security requires both prevention and mitigation. Prevention is protection against a successful attack, and mitigation is reducing the consequences of an attack that has been partially or fully successful.
The guidelines are grouped into the phases of the software lifecycle, but four issues apply in every phase. The people responsible and the terminology may change across phases, but the fundamental issues remain the same. The following examples, when considered carefully, can trigger additional questions and examples.
- Validity (including being ethical and non-biased unless deliberate, such as promoting safety) How is the data collected and curated? How is the validity of the machine-learning model validated? How is its validity maintained through all types of updates?
- Verifiability How are the data verified against the specifications for the system? What prevents hallucinations or at least makes the outputs verifiable by identifying sources?
- Traceability Is there a step-by-step record of responsibility for each part of the AI system from the beginning of design through the latest update of the system in use?
- Shielding What protections are in place from the beginning of design to the latest update to stop the various kinds of malicious attacks? For example, are the GPUs and any extra processors protected as well as the traditional core system? Is the system running on any untrusted devices? What prevents attacks through malicious prompts?
A crucial early step in an AI project is educating your entire team: (1) why protection and mitigation are very important and (2) what the consequences can be when a mitigation plan is not adequate. People cannot protect against an attack they do not understand, and they cannot mitigate well without understanding how to plan.