行业研究报告哪里找

#行业研究报告哪里找| 来源: 网络整理| 查看: 265

欧盟网络安全局：保护机器学习算法（英文版）（70页）.pdf

0 SECURING MACHINE LEARNING ALGORITHMS DECEMBER 2021 SECURING MACHINE LEARNING ALGORITHMS December 2021 1 ABOUT ENISA The European Union Agency for Cybersecurity,ENISA,is the Unions agency dedicated to achieving a high common level of cybersecurity across Europe.Established in 2004 and strengthened by the EU Cybersecurity Act,the European Union Agency for Cybersecurity contributes to EU cyber policy,enhances the trustworthiness of ICT products,services and processes with cybersecurity certification schemes,cooperates with Member States and EU bodies,and helps Europe prepare for the cyber challenges of tomorrow.Through knowledge sharing,capacity building and awareness raising,the Agency works together with its key stakeholders to strengthen trust in the connected economy,to boost resilience of the Unions infrastructure,and,ultimately,to keep Europes society and citizens digitally secure.More information about ENISA and its work can be found here:www.enisa.europa.eu.CONTACT For contacting the authors please use infoenisa.europa.eu For media enquiries about this paper,please use pressenisa.europa.eu EDITORS Apostolos Malatras,Ioannis Agrafiotis,Monika Adamczyk,ENISA ACKNOWLEDGEMENTS We would like to thank the Members and Observers of the ENISA ad hoc Working Group on Artificial Intelligence for their valuable input and feedback.LEGAL NOTICE Notice must be taken that this publication represents the views and interpretations of ENISA,unless stated otherwise.This publication should not be construed to be a legal action of ENISA or the ENISA bodies unless adopted pursuant to the Regulation(EU)No 2019/881.This publication does not necessarily represent state-of the-art and ENISA may update it from time to time.Third-party sources are quoted as appropriate.ENISA is not responsible for the content of the external sources including external websites referenced in this publication.This publication is intended for information purposes only.It must be accessible free of charge.Neither ENISA nor any person acting on its behalf is responsible for the use that might be made of the information contained in this publication.COPYRIGHT NOTICE European Union Agency for Cybersecurity(ENISA),2021 Reproduction is authorised provided the source is acknowledged.Copyright for the image on the cover:Shutterstock For any use or reproduction of photos or other material that is not under the ENISA copyright,permission must be sought directly from the copyright holders.ISBN:978-92-9204-543-2 DOI:10.2824/874249-Catalogue Nr.:TP-06-21-153-EN-N SECURING MACHINE LEARNING ALGORITHMS December 2021 2 TABLE OF CONTENTS EXECUTIVE SUMMARY 3 1.INTRODUCTION 4 1.1 OBJECTIVES 4 1.2 METHODOLOGY 4 1.3 TARGET AUDIENCE 5 1.4 STRUCTURE 6 2.MACHINE LEARNING ALGORITHMS TAXONOMY 7 2.1 MAIN DOMAIN AND DATA TYPES 8 2.2 LEARNING PARADIGMS 9 2.3 NAVIGATING THE TAXONOMY 10 2.4 EXPLAINABILITY AND ACCURACY 10 2.5 AN OVERVIEW OF AN END-TO-END MACHINE LEARNING LIFECYCLE 11 3.ML THREATS AND VULNERABILITIES 13 3.1 IDENTIFICATION OF THREATS 13 3.2 VULNERABILITIES MAPPED TO THREATS 16 4.SECURITY CONTROLS 18 4.1 SECURITY CONTROLS RESULTS 18 5.CONCLUSION 26 A ANNEX:TAXONOMY OF ALGORITHMS 28 B ANNEX:MAPPING SECURITY CONTROLS TO THREATS 34 C ANNEX:IMPLEMENTING SECURITY CONTROLS 38 D ANNEX:REFERENCES 43 SECURING MACHINE LEARNING ALGORITHMS December 2021 3 EXECUTIVE SUMMARY The vast developments in digital technology influence every aspect of our daily lives.Emerging technologies,such as Artificial Intelligence(AI),which are in the epicentre of the digital evolution,have accelerated the digital transformation contributing in social and economic prosperity.However,the application of emerging technologies and AI in particular,entails perils that need to be addressed if we are to ensure a secure and trustworthy environment.In this report,we focus on the most essential element of an AI system,which are machine learning algorithms.We review related technological developments and security practices to identify emerging threats,highlight gaps in security controls and recommend pathways to enhance cybersecurity posture in machine learning systems.Based on a systematic review of relevant literature on machine learning,we provide a taxonomy for machine learning algorithms,highlighting core functionalities and critical stages.The taxonomy sheds light on main data types used by algorithms,the type of training these algorithms entail(supervised,unsupervised)and how output is shared with users.Particular emphasis is given to the explainability and accuracy of these algorithms.Next,the report presents a detailed analysis of threats targeting machine learning systems.Identified threats include inter alia,data poisoning,adversarial attacks and data exfiltration.All threats are associated to particular functionalities of the taxonomy that they exploit,through detailed tables.Finally,we examine mainstream security controls described in widely adopted standards,such as ISO 27001 and NIST Cybersecurity framework,to understand how these controls can effectively detect,deter and mitigate harms from the identified threats.To perform our analysis,we map all the controls to the core functionalities of machine learning systems that they protect and to the vulnerabilities that threats exploit in these systems.Our analysis indicates that the conventional security controls,albeit very effective for information systems,need to be complemented by security controls tailored to machine learning functionalities.To identify these machine-learning controls,we conduct a systematic review of relevant literature,where academia and research institutes propose ways to avoid and mitigate threats targeting machine learning algorithms.Our report provides an extensive list of security controls that are applicable only for machine learning systems,such as“include adversarial examples to training datasets”.For all controls,we map the core functionality of machine learning algorithms that they intend to protect to the vulnerabilities that threats exploit.Our findings indicate that there is no unique strategy in applying a specific set of security controls to protect machine learning algorithms.The overall cybersecurity posture of organisations who use machine learning algorithms can be enhanced by carefully choosing controls designed for these algorithms.As these controls are not validated in depth,nor standardised in how they should be implemented,further research should focus on creating benchmarks for their effectiveness.We further identified cases where the deployment of security controls may lead to trade-offs between security and performance.Therefore,the context in which controls are applied is crucial and next steps should focus on considering specific use cases and conducting targeted risk assessments to better understand these trade-offs.Finally,given the complexity of securing machine learning systems,governments and related institutions have new responsibilities in raising awareness regarding the impact of threats on machine learning.It is important to educate data scientists on the perils of threats and on the design of security controls before machine learning algorithms are used in organisations environments.By engaging experts in machine learning in cybersecurity issues,we may create the opportunity to design innovative security solutions and mitigate the emerging threats on machine learning systems.This report provides a taxonomy for machine learning algorithms,a detailed analysis of threats and security controls in widely adopted standards SECURING MACHINE LEARNING ALGORITHMS December 2021 4 1.INTRODUCTION Artificial Intelligence(AI)has grown significantly in recent years and driven by computational advancements has found wide applicability.By providing new opportunities to solve decision-making problems intelligently and automatically,AI is being applied to more and more use cases in a growing number of sectors.The benefits of AI are significant and undeniable.However,the development of AI is also accompanied by new threats and challenges,which relevant professionals will have to face.In 2020,ENISA published a threat landscape report on AI1.This report,published with the support of the Ad-Hoc Working Group on Artificial Intelligence Cybersecurity2,presents the Agencys active mapping of the AI cybersecurity ecosystem and its threat landscape.This threat landscape not only lays the foundation for upcoming cybersecurity policy initiatives and technical guidelines,but also stresses relevant challenges.Machine learning(ML),which can be defined as the ability for machines to learn from data to solve a task without being explicitly programmed to do so,is currently the most developed and promising subfield of AI for industrial and government infrastructures.It is also the most commonly used subfield of AI in our daily lives.ML algorithms and their specificities,such as the fact that they need large amount of data to learn,make them the subject of very specific cyber threats that project teams must consider.The aim of this study is to help project teams identify the specific threats that can target ML algorithms,associated vulnerabilities,and security controls for addressing these vulnerabilities.Building on the ENISA AI threat landscape mapping,this study focuses on cybersecurity threats specific to ML algorithms.Furthermore,vulnerabilities related to the aforementioned threats and importantly security controls and mitigation measures are proposed.The adopted description of AI is a deliberate simplification of the state of the art regarding that vast and complex discipline with the intent of not precisely or comprehensively define it but rather pragmatically contextualise the specific technique of machine learning.1.1 OBJECTIVES The objectives of this publication are:To produce a taxonomy of ML techniques and core functionalities to establish a logical link between threats and security controls.To identify the threats targeting ML techniques and the vulnerabilities of ML algorithms,as well as the relevant security controls and how these are currently being used in the field to ensure minimisation of security risks.To propose recommendations on future steps to enhance cybersecurity in systems that rely on ML techniques.1.2 METHODOLOGY To produce this report,the work was divided into three stages.At the core of the methodology was an extensive literature review(full list of references may be found in Annex D).The aim 1 https:/www.enisa.europa.eu/publications/artificial-intelligence-cybersecurity-challenges 2 See https:/www.enisa.europa.eu/topics/iot-and-smart-infrastructures/artificial_intelligence/ad-hoc-working-group/adhoc_wg_calls SECURING MACHINE LEARNING ALGORITHMS December 2021 5 was to consult documents that are more specific to ML algorithms in general in order to build the taxonomy,and to consult documents more specific to security to identify threats,vulnerabilities,and security controls.At the end of the systematic review,more than 200 different documents(of which a hundred are related to security)on various algorithms of ML had been collected and analysed.First,we introduced a high-level ML taxonomy.To understand the vulnerabilities of different ML algorithms,how they can be threatened and protected,it is crucial to have an overview of their core functionalities and lifecycle.To do so,a first version of the desk research on ML-focussed sources was compiled and the ML lifecycle presented in ENISAs work on AI cybersecurity challenges was consulted3.We then analysed and synthesised all references to produce a first draft of the taxonomy.The draft was submitted and interviews were held with the ENISA Ad-Hoc Working Group on Artificial Intelligence Cybersecurity.After considering their feedback,the ML taxonomy and lifecycle were validated.The second step was to identify the cybersecurity threats that could target ML algorithms and potential vulnerabilities.For this task,the threat landscape from ENISAs report on AI cybersecurity challenges was the starting point,which was then enriched through desk research with sources related to the security of ML algorithms.Additionally,the expertise of the ENISA Ad-Hoc Working Group on Artificial Intelligence Cybersecurity was sought.This work allowed us to select threats and identify associated vulnerabilities.Subsequently,they were linked to the previously established ML taxonomy.The last step of this work was the identification of the security controls addressing the vulnerabilities.To do this,we utilised the desk research and enriched it with the most relevant standard security controls from ISO 27001/2 and the NIST 800-53 framework.The output was reviewed with the experts of the ENISA Ad-Hoc Working Group on Artificial Intelligence Cybersecurity.This work allowed us to identify security controls that were then linked to the ML taxonomy.It is important to note that we opted to enrich the ML-targeted security controls with more conventional ones to highlight that applications using ML must also comply with more classic controls in order to be sufficiently protected.Considering measures that are specific to ML would only give a partial picture of the security work needed on these applications.1.3 TARGET AUDIENCE The target audience of this report can be divided into the following categories:Public/governmental sector(EU institutions and agencies,Member States regulatory bodies,supervisory authorities in the field of data protection,military and intelligence agencies,law enforcement community,international organisations,and national cybersecurity authorities):to help them with their risk analysis,identify threats and understand how to secure ML algorithms.Industry(including Small and Medium Enterprises(SMEs)that makes use of AI solutions and/or is engaged in cybersecurity,including operators of essential services:to help them with their risk analysis,identify threats and understand how to secure ML algorithms.AI technical community,AI cybersecurity experts and AI experts(designers,developers,ML experts,data scientists,etc.)with an interest in developing secure solutions and in integrating security and privacy by design in their solutions.Cybersecurity community:to identify threats and security controls that can apply to ML algorithms.3 https:/www.enisa.europa.eu/publications/artificial-intelligence-cybersecurity-challenges SECURING MACHINE LEARNING ALGORITHMS December 2021 6 Academia and research community:to obtain knowledge on the topic of securing ML algorithms and identify existing work in the field.Standardisation bodies:to help identify key aspects to consider regarding securing ML algorithms.1.4 STRUCTURE The report aims to help the target audience to identify the cyber threats to consider and the security controls to deploy in order to secure their ML applications.Accordingly,the report is structure into three sections:ML algorithms taxonomy:first,a taxonomy to describe the main characteristics of the algorithms is defined.The different ML algorithms are categorised based on their core functionalities(e.g.,the learning paradigm)and the lifecycle of a ML algorithm is defined.Identification of relevant threats and vulnerabilities:secondly,a list of the cybersecurity threats and associated vulnerabilities to consider for ML algorithms is defined.Threats are mapped to the taxonomy to highlight the link between them,the core functionalities,and the lifecycle of the ML algorithms.Security controls:thirdly,a list of security controls for addressing the previously considered vulnerabilities is given.They are also mapped to the ML taxonomy.This report focuses on threats that target ML algorithms and on the associated security controls.It is important to note that this publication examines security controls that are specific to ML algorithms as well as standard security controls that are also applicable to ML algorithms and systems making use of them.To use this publication effectively,it is important to note that:As is the case for any application,when using ML,one must also consider traditional security standards(e.g.ISO 27001/2,NIST 800-53),because ML applications are subject not only to AI/ML specific threats but also to general nature cybersecurity threats.The context of the application(e.g.manipulated data,business case,deployment)must be considered to correctly assess the risks and prioritise deployment of the security controls accordingly.SECURING MACHINE LEARNING ALGORITHMS December 2021 7 2.MACHINE LEARNING ALGORITHMS TAXONOMY One of the objectives of this work was to devise a(non-exhaustive)taxonomy,to support the process of identifying which specific threats can target ML algorithms,their associated vulnerabilities,and security controls for addressing these vulnerabilities.An important disclaimer needs to be made concerning this taxonomy,namely that it is not meant to be complete or exhaustive when it comes to ML,instead it aims to support the security analysis of ML algorithms in this report.Based on the desk research and interviews with experts of the ENISA AI Working group,we identified 40 of the most commonly used ML algorithms.A taxonomy was built based on the analysis of these algorithms.In particular,it was noted that ML algorithms were driven mainly by the learning paradigms and the problem they address(main domain).These aspects were therefore chosen to form the key taxonomy dimensions,as seen in Figure 1.It should be noted that Annex A provides a complete listing of the 40 algorithms and their mapping to the features of the taxonomy,whereas the Figure serves for illustration purposes.Figure 1:Machine Learning Algorithm taxonomy SECURING MACHINE LEARNING ALGORITHMS December 2021 8 There is a strong correlation between the domain of application(the problem being addressed)and the data type which is being worked on,as well as between data environments and learning paradigm.Thus,further dimensions of the taxonomy were introduced accordingly.2.1 MAIN DOMAIN AND DATA TYPES Different algorithms are used in different domains of ML.Therefore,the algorithms have been categorised according to the main domains represented.Three main domains were(non-exhaustively)selected,namely Computer Vision,NLP(Natural Language Processing)&Speech Processing(understanding and generating speech),and Classic Data Science.The inputs that are given to a ML algorithm are data and therefore,the algorithms can be categorised based on the types of data that is fed into them.In most cases,specific types of data are used in certain domains of ML.Indeed,all the algorithms used in computer vision are fed with images and videos,in the same way that all algorithms used in Natural Language Processing are fed with text4.In Table 1,the main domains and the type of data used in each of them are listed.Table 1:Main domains and data types Main domain Data type Definition Computer Vision Image Visual representation of a matrix of pixels constituted of 1 channel for black and white images,3 elements(RGB)for coloured images or 4 elements(RGBA)for coloured images with opacity.Video A succession of images(frames),sometimes grouped with a time series(a sound).NLP&Speech processing Text A succession of characters(e.g.a tweet,a text field).Time series5 A series of data points(e.g.numerical)indexed in time order.Classic Data Science Structured Data Data organised in a predefined model of array with one specific column for each feature(e.g.textual,numerical data,date).To be more accurate,structured data refer to organised data that can be found in a relational data base for example(that may contain textual columns as mentioned).Quantitative data can be distinguished from qualitative data.Quantitative data corresponds to the numerical data that can supports some arithmetic operations whereas qualitative data is usually used as categorical data to classify data according to their similarities.Certain domains such as NLP and Computer Vision have been separated from Classic Data Science.The purpose of this separation was to make a distinction between algorithms that may be used specifically or predominantly for each domain.4 Audio data are also used for speech recognition.For the purposes of this report,we consider only text for the NLP for the taxonomy.considering that this will not create differences for the work on threats.5 For the purposes of this report,time series belong to the two main domains:Classic Data Science and Speech processing.By restraining Time series to Classic Data Science and Speech processing,we aspired to emphasise the specific approaches that are used for this domain like ARIMA and Hidden Markov Model.Furthermore,we include audio data under time series and made the choice to separate video from time series.SECURING MACHINE LEARNING ALGORITHMS December 2021 9 2.2 LEARNING PARADIGMS Learning paradigm in ML relates to how a machine learns when data is fed to it.For example,all the classification and regression algorithms use labelled data,meaning that they are doing only supervised learning.Indeed,supervised learning,by definition,is the learning of labelled data,which can be either numerical(in this case,the learning paradigm is regression),or categorical(the learning paradigm is classification).An example of classification can be differentiating a cat from a dog in a picture,and an example of regression can be predicting the price of a house.On the other hand,a clustering algorithm uses unlabeled data,which is an unsupervised type of learning.Therefore,one can conclude that each learning paradigm is a specific case of one data environment.In addition to the data types fed into the algorithms,we also focused on three learning paradigms,namely supervised learning,unsupervised learning,and reinforcement learning:Supervised learning learns a function that maps an input to an output based on example input-output pairs.It infers a function from labelled training data consisting of a set of training examples.Unsupervised learning learns patterns from unlabelled data.It discovers hidden patterns or data groupings without the need for human intervention.Reinforcement learning enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences.Table 2:Learning paradigms with typical subtypes.Learning paradigm Subtypes Definition Supervised learning Classification Classification is the process of predicting the class of given data points.(Is the picture a cat or a dog?)Regression Regression models are used to predict a continuous value.(Predict the price of a house based on its features).Unsupervised learning Clustering Clustering is the task of dividing a set of data points into several groups such that data points in the same groups are more similar each other than from the data points of the other groups.Dimensionality reduction Dimensionality reduction refers to techniques for reducing the number of input variables in training data.Reinforcement learning Rewarding Rewarding is an area of ML concerned with how intelligent agents ought to take actions in an environment to maximise the notion of cumulative reward,learning by using feedback from their experiences.Each of these learning paradigms have different security-related properties which may lead to attacks and therefore,it is relevant to represent this information in the taxonomy of ML algorithms,from which security controls will be mapped.For instance,the most common learning paradigm is classification and thus,it has many more examples of vulnerabilities due to its popularity.SECURING MACHINE LEARNING ALGORITHMS December 2021 10 2.3 NAVIGATING THE TAXONOMY Each algorithm is placed in its corresponding cell of the taxonomy grid,according to its learning paradigm,data type and main domain.For instance,Recurrent Neural Networks6(RNN),which are a type of neural network helpful in modelling sequenced data,are used for regression in supervised learning,so they must be mapped in the first column.Moreover,the data fed into them can be text,time series,images,or videos so the RNN box covers all the corresponding lines in the taxonomy.However,some of the widely used and mentioned algorithms are based on common elementary components,or are extensions of the same principle,and can therefore form families or clusters of algorithms on this taxonomy grid.Hence,we map those specific algorithms in groups by using nested boxes,as it allows for the representation of a wide variety of algorithms,while showing that some have relationships with one another.To continue with the previous example,a more recent version of RNN is LSTM7(Long-Short Term Memory),which differs from RNN based on its optimisation techniques,making it faster to learn and more precise.Since LSTM is a specific extension of RNN,the LSTM box was nested in the RNN box in the taxonomy:this indicates that the two algorithms are part of the same family.2.4 EXPLAINABILITY AND ACCURACY An important aspect of security of AI is that of explainability.Understanding the algorithms and making them explainable makes them more accessible to as many people as possible.It also helps to increase the trustworthiness of AI and support forensics and analysis of decisions.Following inputs from the desk research exercise and from the research on attacks targeting ML models,we additionally included two important parameters in the taxonomy:Explainability:For the purposes of this study,algorithms are deemed to be explainable if the decision it makes can be understood by a human.That is to say,decisions can be understood by a human such as a developer or an auditor and then explained to an end-user,for example.To be fully explainable,an algorithm must be:o Globally explainable:a user can identify the features importance for the trained model.o Locally explainable:a user can explain why the algorithm gives a specific output(prediction)to a specific input data(features values).Accuracy(probability score):Some algorithms provide,in addition to a predictive output,the probability of this prediction which can be interpreted as an“accuracy level”.If an algorithm doing classification predicts that a picture of a cat is indeed a picture of a cat at 95curacy,one can say that the algorithm has a“high accuracy classification”.Otherwise,if the prediction was at 55curacy,one could say that the algorithm has a“low accuracy classification”.It is important to note that we focused on the algorithms explainability because this work is important for other parts of the publication.For example,in one identified security control,it is highlighted that it is necessary to ensure that ML projects comply with regulatory constraints such as the GDPR,which describes some explainability requirements8.6 https:/apps.dtic.mil/dtic/tr/fulltext/u2/a164453.pdf 7 https:/www.bioinf.jku.at/publications/older/2604.pdf 8 GDPR Recital 71“The data subject should have the right not to be subject to a decision,which may include a measure,evaluating personal aspects relating to him or her which is based solely on automated processing and which produces legal effects concerning him or her or similarly significantly affects him or her,such as automatic refusal of an online credit application or e-recruiting practices without any human intervention.In any case,such processing should be subject to 1 Please use footnotes for providing additional or explanatory information and/or relevant links.References should be listed in a dedicated section.Use only the function References/Insert Footnote SECURING MACHINE LEARNING ALGORITHMS December 2021 11 2.5 AN OVERVIEW OF END-TO-END MACHINE LEARNING LIFECYCLE An ML system lifecycle includes several interdependent phases ranging from its design and development(including sub-phases such as requirement analysis,data collection,training,testing,integration),installation,deployment,operation,maintenance,and disposal.It defines the phases that an organisation should follow to take advantage of AI and of ML models in particular to derive practical business value.The latter can be represented as the architecture illustrated in Figure 229:Figure 2:Typical AI lifecycle(from the ENISA AI Threat Landscape)Building on the AI lifecycle,we describe in Figure 3 an overview of a typical ML lifecycle with a complete overview of the principal steps.suitable safeguards,which should include specific information to the data subject and the right to obtain human intervention,to express his or her point of view,to obtain an explanation of the decision reached after such assessment and to challenge the decision.”9 https:/www.enisa.europa.eu/publications/artificial-intelligence-cybersecurity-challenges 1 Please use footnotes for providing additional or explanatory information and/or relevant links.References should be listed in a dedicated section.Use only the function References/Insert Footnote SECURING MACHINE LEARNING ALGORITHMS December 2021 12 Figure 3:ML Algorithm lifecycle10,11 The aim of the ML algorithm taxonomy is to focus not only on the functionalities of the algorithms but also on the ML models workflow represented by the lifecycle.This lifecycle summarises the principle steps to produce an ML model.It is important to note that several steps could have been added,such as data creation and data analysis(for instance,to analyse if there are some personal data or biases).However,to simplify the lifecycle,some steps have been condensed.Thus,for example,data cleaning has been included.Regarding data creation,it was considered as being external to the ML lifecycle.10 Optimisation is also known as model tuning.11 Data cleaning and data processing have been separated to distinguish the cleaning phase from the adaptation phase of the dataset for learning(dimension reduction,feature engineering,etc.).SECURING MACHINE LEARNING ALGORITHMS December 2021 13 3.ML THREATS AND VULNERABILITIES 3.1 IDENTIFICATION OF THREATS Based on the methodology described in the Introduction and using a combination of desktop research and experts interviews,we identified a list of six high-level threats and seven sub-threats that were then mapped to the taxonomy.It is important to note that:Threats against supporting infrastructures are not analysed in this publication.All threats relate to the previous ENISA publication on the AI Threat Landscape;accordingly,they have been mapped to AI assets(environments,tools,data,etc.)The table following summarises Machine Learning threats and includes:Threats and sub-threats definitions.Whether they are specific to ML algorithms or not.At which stage of the life cycle defined in the first section the threat is likely to occur.SECURING MACHINE LEARNING ALGORITHMS December 2021 14 Table 3:Threats and sub-threats Threats|sub-threats Definition Stage of the lifecycle Data Collection Data Cleaning Data Preprocessing Model design Model Training Model Testing Optimisation Model Evaluation Model Deployment Monitoring Evasion A type of attack in which the attacker works on the ML algorithms inputs to find small perturbations leading to large modification of its outputs(e.g.decision errors).It is as if the attacker created an optical illusion for the algorithm.Such modified inputs are often called adversarial examples.Example:the projection of images on a house could lead the algorithm of an autonomous car to take the decision to suddenly make it brake.x Use of adversarial examples crafted in white or grey box conditions(e.g.FGSM)In some cases,the attacker has access to information(model,model parameters,etc.)that can allow him to directly build adversarial examples.One example is to directly use the models gradient to find the best perturbation to add to the input data to evade the model.x Oracle A type of attack in which the attacker explores a model by providing a series of carefully crafted inputs and observing outputs.These attacks can be previous steps to more harmful types,evasion or poisoning for example.It is as if the attacker made the model talk to then better compromise it or to obtain information204 about it(e.g.model extraction)or its training data(e.g.membership inferences attacks and Inversion attacks).Example:an attacker studies the set of input-output pairs and uses the results to retrieve training data.x Poisoning A type of attack in which the attacker altered data or model to modify the ML algorithms behavior in a chosen direction(e.g.to sabotage its results,to insert a backdoor).It is as if the attacker conditioned the algorithm according to its motivations.Such attacks are also called causative attacks.Example:massively indicating to an image recognition algorithm that images of dogs are indeed cats to lead it to interpret it this way.x x x x x x x x Label modification An attack in which the attacker corrupts the labels of training data.This sub-threat is specific to Supervised Learning.x x x x Model or data disclosure This threat refers to the possibility of leakage of all or partial information about the model.12 Example:the outputs of a ML algorithm are so verbose that they give information about its configuration(or leakage of sensitive data)x x x x x x x x x x 12 We have chosen to separate the oracle attacks from this threat to address the specifics of both threats and give them both a fair representation.However,Oracle-type attacks may be considered as a ML specific sub-threat of model or data disclosure.SECURING MACHINE LEARNING ALGORITHMS December 2021 15 Threats|sub-threats Definition Stage of the lifecycle Data Collection Data Cleaning Data Preprocessing Model design Model Training Model Testing Optimisation Model Evaluation Model Deployment Monitoring Data disclosure This threat refers to a leak of data manipulated by ML algorithms.This data leakage can be explained by an inadequate access control,a handling error of the project team or simply because sometimes the entity that owns the model and the entity that owns the data are distinct.To train the model,it is often necessary for the data to be accessed by the model provider.This involve sharing the data and thus share sensitive data with a third party.x x x x x x x x x x Model disclosure This threat refers to a leak of the internals(i.e.parameter values)of the ML model.This model leakage could occur because of human error or contraction with a third party with a too low security level.x x x x x x x Compromise of ML application components This threat refers to the compromise of a component or developing tool of the ML application.Example:compromise of one of the open-source libraries used by the developers to implement the ML algorithm.x x x x x x x x x x Failure or malfunction of ML application This threat refers to ML application failure(e.g.denial of service due to bad input,unavailability due to a handling error).Example:the service level of the support infrastructure of the ML application hosted by a third party is too low compared to the business needs,the application is regularly unavailable.Note that this threat does not consider failure of business use cases(for example,the algorithm fails because it is not accurate enough to handle all real-life situations it is exposed to).x x Human error The different stakeholders of the model can make mistakes that result in a failure or malfunction of ML application.For example,due to lack of documentation,they may use the application in use-cases not initially foreseen.x x x x x x x x x x Denial of service due to inconsistent data or a sponge example ML algorithms usually consider input data in a defined format to make their predictions.Thus,a denial of service could be caused by input data whose format is inappropriate.It may also happen that a malicious user of the model constructs an input data(a sponge example)specifically designed to increase the computation time of the model and thus potentially cause a denial of service.x Cybersecurity incident not reported to incident response teams This threat refers to the possibility that a project team may not report security incidents to dedicated teams while a policy of mandatory incident reporting has been defined.x x x x x x x x x x SECURING MACHINE LEARNING ALGORITHMS December 2021 16 3.2 VULNERABILITIES MAPPED TO THREATS To identify the security controls,we determined vulnerabilities associated with the threats described in the previous section.It is important to note that the same vulnerabilities may be found behind one or more threats(e.g.the“Poor access management”vulnerability).The table below lists vulnerabilities of ML algorithms and maps them to the aforementioned threats.Table 4:Threats and associated vulnerabilities Threats|sub-threats Vulnerabilities Evasion Lack of detection of abnormal inputs Poor consideration of evasion attacks in the model design implementation Poor consideration of evasion attacks in the model design implementation Lack of training based on adversarial attacks Using a widely known model allowing the attacker to study it Inputs totally controlled by the attacker which allows for input-output-pairs Use of adversarial examples crafted in white or grey box conditions(e.g.FGSM)Too much information available on the model Too much information about the model given in its outputs Oracle Poor access rights management The model allows private information to be retrieved Too much information about the model given in its outputs Too much information available on the model Lack of consideration of attacks to which ML applications could be exposed to Lack of security process to maintain a good security level of the components of the ML application Weak access protection mechanisms for ML model components Poisoning Model easy to poison Lack of data for increasing robustness to poisoning Poor access rights management Poor data management Undefined indicators of proper functioning,making complex compromise identification Lack of consideration of attacks to which ML applications could be exposed to Use of uncontrolled data Use of unsafe data or models(e.g.with transfer learning)Lack of control for poisoning No detection of poisoned samples in the training dataset Weak access protection mechanisms for ML model components Label modification Use of unreliable sources to label data Model or data disclosure Poor access rights management Existence of unidentified disclosure scenarios Weak access protection mechanisms for ML model components SECURING MACHINE LEARNING ALGORITHMS December 2021 17 Threats|sub-threats Vulnerabilities Lack of security process to maintain a good security level of the components of the ML application Unprotected sensitive data on test environments Data disclosure Too much information about the model given in its outputs The model can allow private information to be retrieved Disclosure of sensitive data for ML algorithm training Model disclosure Too much information available on the model Too much information about the model given in its outputs Compromise of ML application components Poor access rights management Too much information available on the model Existence of several vulnerabilities because the ML application was not included into process for integrating security into projects Use of vulnerable components(among the whole supply chain)Too much information about the model given in its outputs Existence of unidentified compromise scenarios Undefined indicators of proper functioning,making complex compromise identification Bad practices due to a lack of cybersecurity awareness Lack of security process to maintain a good security level of the components of the ML application Weak access protection mechanisms for ML model components Existence of several vulnerabilities because ML specificities are not integrated to existing policies Existence of several vulnerabilities because ML application do not comply with security policies Contract with a low security third party Failure or malfunction of ML application Existing biases in the ML model or in the data ML application not integrated in the cyber-resilience strategy Existence of unidentified failure scenarios Undefined indicators of proper functioning,making complex malfunction identification Lack of explainability and traceability of decisions taken Lack of security process to maintain a good security level of the components of the ML application Existence of several vulnerabilities because ML specificities are not integrated in existing policies Contract with a low security third party Application not compliant with applicable regulations Human error Poor access rights management Lack of documentation on the ML application Denial of service due to inconsistent data or a sponge example Use of uncontrolled data Cybersecurity incident not reported to incident response teams Lack of cybersecurity awareness SECURING MACHINE LEARNING ALGORITHMS December 2021 18 4.SECURITY CONTROLS 4.1 SECURITY CONTROLS RESULTS Having identified a set of threats that can target vulnerabilities in applications which use ML algorithms,it is possible to identify which security controls can be put in place to mitigate them.To do this,we commenced with the vulnerabilities identified in the previous Chapter and came up with a list of 37 security controls that were then mapped to the taxonomy.Table 5 summarises security controls for ML algorithms and lists:Security controls definitions.At which stage of the lifecycle the security controls can be applied.For ease of reading,they were divided into three categories:“Organisational and Policy”are more traditional security controls,either organisational or linked to security policies.“Technical”are more classic technical security controls.“Specific to ML”are security controls that are specific to applications using ML.In Annex 5.C,a set of operational implementation examples are listed for each of the security controls.This includes:For security controls not specific to ML algorithms:examples from the ISO 27001/213 family of standards or NIST 800-5314 framework that should be considered when implementing the security control.For security controls specific to ML:examples of techniques found in the current literature.All sources are referenced and may be found in Annex 5.D.The overall mapping of threats,vulnerabilities and security controls is available in Annex 5.B.13 https:/www.iso.org/isoiec-27001-information-security.html 14 https:/csrc.nist.gov/publications/detail/sp/800-53/rev-5/final SECURING MACHINE LEARNING ALGORITHMS December 2021 19 Table 5:Security controls Security controls Definition Stages of the lifecycle Data Collection Data Cleaning Data Preprocessing Model design and Implementation Model Training Model Testing Optimisation Model Evaluation Model Deployment Monitoring ORGANISATIONAL Apply a RBAC model,respecting the least privileged principle Define access rights management using a RBAC(Role Based Access Control)model respecting the least privileged principle.This should cover all components of the ML model(e.g.host infrastructures)and allow for the protection of resources such as the model(e.g.its configuration,its code)and the data it used(e.g.training data).It is notable that the roles to be included also concern the end user.For example:the end user who can submit inputs to the model should not be able to have access to its configuration.x x x x x x x x x x Apply documentation requirements to AI projects As for all projects,documentation must be produced for AI to preserve knowledge on the choices made during the project phase,the application architecture,its configuration,its maintenance,how to maintain its effectiveness over time and the assumptions made about the model use.This documentation should also include the changes that will be applied,including to the documentation throughout the algorithms life cycle.x x x x x x x x x x Assess the regulations and laws the ML application must comply with As all applications,those using ML can be subject to regulations and laws(e.g.,depending on collected data).Such assessment must be done as soon as possible during the project phase,and should be regularly updated thereafter as regulations are rapidly evolving(e.g.,an AI Act has been proposed at the European level).x x x x x x x x x x Ensure ML applications comply with data security requirements As all applications,those using ML must comply with data security requirements to ensure the overall lifecycle of the data they use will be secured(e.g.description of data lifecycle and associated controls,data classification,protection of data at rest and in transit,use of appropriate cryptographic means,data quality controls).x x x x x x x x x x Ensure ML applications comply with identity management,authentication,and access control policies As all applications,those using ML must comply with defined policies regarding identity management(e.g.ensure all users are integrated in the departure process),authentication(e.g.passwords complexity,use of Multi-Factors Authentication(MFA),access restriction)and access control(e.g.RBAC model,connection context).Underlying security requirements must be applied to all ML application components(e.g.model configuration,host infrastructures,training data).x x x x x x x x x x SECURING MACHINE LEARNING ALGORITHMS December 2021 20 Security controls Definition Stages of the lifecycle Data Collection Data Cleaning Data Preprocessing Model design and Implementation Model Training Model Testing Optimisation Model Evaluation Model Deployment Monitoring Ensure ML applications comply with protection policies and are integrated to security operations processes As all applications,those using ML must comply with protection policies(e.g.hardening,anti-malware policy)and be integrated to security operations processes(e.g.vulnerability management,backups).x x x x x x x x x x Ensure ML applications comply with security policies As all applications,those using ML must comply with existing security policies.x x x x x x x x x x Include ML applications into detection and response to security incident processes15 As all applications,those using ML must be integrated in global processes for detection and incident response.This implies collecting the appropriate logs,configuring relevant detection use cases to detect attacks on the application,and giving the keys to incident response team for efficient response.x x x x x x x x x X Include ML applications in asset management processes As all applications,those using ML must be integrated to global processes for asset management to ensure their assets are inventoried,their owners are identified,their information classified.x x x x x x x x x X Integrate ML applications into the overall cyber-resilience strategy As any application,ML ones must be integrated in the overall cyber-resilience strategy,to ensure their architecture and operational processes(e.g.backups)take into account cybersecurity scenario.x x x x x x x x x x Integrate ML specificities to existing security policies Specific ML security attention points should be integrated in existing security policies and guidelines to ensure they are taken into consideration.x x x x x x x x x x TECHNICAL Assess the exposure level of the model used Some model designs are more commonly used or shared than others and,especially in the ML field;it can be included in their lifecycle to widely share them(e.g.open source sharing).These aspects must be considered in the global application risk analysis.For example,two elements can be distinguished:-Do not reuse models taken directly from the internet without checking them.-Use models for which the threats are clearly identified and for which security controls exist.x x x x x x x x x x 15 Please note that ML components with false positives might have adverse effect.SECURING MACHINE LEARNING ALGORITHMS December 2021 21 Security controls Definition Stages of the lifecycle Data Collection Data Cleaning Data Preprocessing Model design and Implementation Model Training Model Testing Optimisation Model Evaluation Model Deployment Monitoring Check the vulnerabilities of the components used so that they have an appropriate security level During the lifecycle of an ML algorithm,several components(such as software,programming libraries or even other models)are used to complete the project.Security checks have to be carried out to ensure that these components offer an adequate level of security.Moreover,some mechanisms need to be used to prevent tampering with the components used.For example:if an open-source library is to be used,code reviews or check for public vulnerabilities on it can be done.x x x x x x x x x x Conduct a risk analysis of the ML application A risk analysis of the overall application should be conducted to take into account the specificities of its context,including:-The attackers motivations-The sensitivity of the data handled(e.g.medical or personal and thus subject to regulatory constraints,strategic for the company and should thus be highly protected)-The application hosting(e.g.through third parties services,cloud or on premise environments)-The model architecture(e.g.its exposition,learning methods)-The ML application lifecycle(e.g.,model sharing x x x x x x x x x x Control all data used by the ML model Data must be checked to ensure they will suit the model and limit the ingestion of malicious data:-Evaluate the trust level of the sources to check its appropriate in the context of the application-Protect their integrity along the whole data supply chain-Their format and consistence are verified-Their content is checked for anomalies,automatically or manually(e.g.selective human control)-In the case of labeled data,the issuer of the label is trusted.x x x x x x x x x x Ensure reliable sources are used ML is a field in which the use of open-source elements is widespread(e.g.,data for training,including labeled ones,models).The trust level of the different sources used should be assessed to prevent using compromise ones.For example:the project wants to use labeled images from a public library.Are the contributors sufficiently trusted to have confidence in the contained images or the quality of their labelling?x x Use methods to clean the training dataset from suspicious samples Removing suspicious samples from the training and testing dataset can help prevent poisoning attacks.Some methods exist to identify those that could cause strange behavior of the algorithm.x x x SECURING MACHINE LEARNING ALGORITHMS December 2021 22 Security controls Definition Stages of the lifecycle Data Collection Data Cleaning Data Preprocessing Model design and Implementation Model Training Model Testing Optimisation Model Evaluation Model Deployment Monitoring Define and monitor indicators for proper functioning of the model Define dashboards of key indicators integrating security indicators(peaks of change in model behavior etc.)to follow-up the proper functioning of the model regarding the business case,in particular to allow rapid identification of anomalies.x Ensure appropriate protection is deployed for test environments Test environments must also be secured according to the sensitivity of the information they contain.Special care must be paid to the data used in these environments,to ensure their protection(e.g.,same protection measures as for production if not desensitiser).x x x x x x x x x x Ensure ML applications comply with third parties security requirements As all applications,those using ML must comply with third parties security requirements if their context involves suppliers.x x x x x x x x x x Ensure ML projects follow the global process for integrating security into projects As any project,ML projects must comply to process for integrating security into projects,including the followings:-Risk analysis on the whole application-Check of the integration of cybersecurity best practices regarding architecture,secure development.-Check that the application will be integrated in existing operational security processes:monitoring and response,patch management,access management,cyber-resilience.-Check of the production of adequate documentation to ensure the sustainability of the application(e.g.,technical architecture,hardening,exploitation,configuration and installation documents)-Security checks before going to production(e.g.security audit,pen tests)x x x x x x x x x x SPECIFIC ML Add some adversarial examples to the training dataset16 Include adversarial examples to the algorithms training to enable it to be more resilient to such attacks.Depending on the application domain and ambient conditions,such training could be done continuously.x x x Apply modifications on inputs17 Adding a step to modify the models inputs(e.g.data randomisation which consists in adding random noise to each piece of data),can improve the robustness of the model to attacks.Such steps can make it more difficult for an attacker to understand the functioning of the algorithm and thus to manipulate it and reduce the impacts of an attack.This security control can be applied during training or model deployment stages.x x 16 This security control is often referred to as“Robust adversarial training”in the literature.17 One important thing to keep in mind is that such modifications should not overly impact model performance on benign inputs.SECURING MACHINE LEARNING ALGORITHMS December 2021 23 Security controls Definition Stages of the lifecycle Data Collection Data Cleaning Data Preprocessing Model design and Implementation Model Training Model Testing Optimisation Model Evaluation Model Deployment Monitoring Build explainable models The ML models should be explainable,even if it means simplifying them,to enable a good understanding of their functioning and decision factors.It can also be a regulatory requirement(e.g.GDPR).However,once again,security interferes with the explainability property of the model(easier-to-understand decisions can be easier-to-build adversarial examples).It is therefore a trade-off between the need for explainability and security.x x Choose and define a more resilient model design Some model designs can be more robust than others against attacks.For instance,ensemble methods like bagging can mitigate the impact of poisoning(during the training phase).Another example is defensive distillation,which may allow deep neural networks to better deal with evasion attacks.x Enlarge the training dataset Using a set of training data expansion techniques(e.g.data augmentation)addresses the lack of data and improves the robustness of the model to poisoning attacks by diluting their impact.It is notable,however,that this security control more specifically addresses poisoning attacks that aim to reduce the performance of the model than those that seek to establish a backdoor.Moreover,one needs to ensure the reliability of the sources used to augment the dataset.x x Ensure that models are unbiased The introduction of bias in ML algorithms will not be detailed because it is not the topic of the publication.However,some techniques can be used to mitigate bias:verify the training dataset is representative enough regarding the business case,check the relevance of the attributes used to make decisions etc.x x x x x x Ensure that models respect differential privacy to a sufficient degree Differential privacy(DP)is a strong,mathematical definition of privacy in the context of statistical and ML analysis.According to this mathematical definition,DP is a criterion of privacy protection,which many tools for analysing sensitive personal information have been devised to satisfy.It is noticeable that this security control can greatly reduce the performance of the model.It is therefore important to estimate the need for data or model protection.Example:Differential privacy makes it possible for technology companies to collect and share aggregate information about user habits,while maintaining the privacy of individual users.x x x x x x x x Ensure that the model is sufficiently resilient to the environment in which it will operate.Ensure that the model is sufficiently resilient against the environment in which it will operate.This includes,for instance,ensure that learning process and data are representative enough of the real conditions in which the model will evolve.x x x x x x x x x X SECURING MACHINE LEARNING ALGORITHMS December 2021 24 Security controls Definition Stages of the lifecycle Data Collection Data Cleaning Data Preprocessing Model design and Implementation Model Training Model Testing Optimisation Model Evaluation Model Deployment Monitoring Implement processes to maintain security levels of ML components over time ML is a rapidly evolving field,especially regarding its cybersecurity.Regular checking of new attacks and defenses must be integrated into the processes for maintaining security level applications.The security level should thus be regularly assessed too.x x x x x x x x x X Implement tools to detect if a data point is an adversarial example or not Input-based detection tools can be of interest to identify whether a given input has been modified by an attacker or not.One example,in the case of Deep Neural Networks(DNNs),is to add a neural subnetwork to an architecture trained to detect adversarial examples.x x x Integrate ML specificities to awareness strategy and ensure all ML stakeholders are receiving it ML considerations should be added to awareness programs for concerned stakeholders and they must all receive cybersecurity awareness training:-Global cybersecurity awareness training including best practices to prevent attackers compromising the ML application.-Manipulation of potentially sensitive data or data subject to regulatory restrictions.-Configurations to prevent applications being vulnerable-ML-specific attack awareness x x x x x x x x x x Integrate poisoning control after the model evaluation phase Before moving the model to production and then on a regular basis,the model should be evaluated to ensure it has not been poisoned.This differs from the security control“Use methods to clean the training dataset from suspicious samples”.Indeed,here,its the model itself that is evaluated.For example:deep learning classification algorithms can be checked for poisoning using the STRIP18 technique.The principle is to disturb the inputs and observe the randomness of the predictions.x Reduce the available information about the model This defense consists of limiting the information about the model when it is not necessary.More precisely,it aims at taking the necessary actions in order to reduce the information available on the model such as information on the training data set or any other information that could be used by an attacker(e.g.,not publishing the model in open source).Of course,there is a trade-off between security and the fact that stakeholders(e.g.,users,ML teams)sometimes want open source models.However,it remains notable that in many cases,research has shown that minimal information is sufficient to mount attacks.x x x x x x x x x x 18 See https:/arxiv.org/pdf/1902.06531.pdf.It is notable that STRIP(STRong Intentional Perturbatio)may have a huge runtime overhead and may be infeasible for large dataset.SECURING MACHINE LEARNING ALGORITHMS December 2021 25 Security controls Definition Stages of the lifecycle Data Collection Data Cleaning Data Preprocessing Model design and Implementation Model Training Model Testing Optimisation Model Evaluation Model Deployment Monitoring Reduce the information given by the model19 Controlling the information(like its verbosity)provided by the model by applying basic cybersecurity hygiene rules is a way of limiting the techniques that an attacker can use to build adversarial examples.One of the basic rules of hygiene,for example,is to reduce the information of the output determined by the model to the maximum,or by profile making the request.For example:considering a classification application,it would consist of communicating only the predicted class to the users of solution,not the associated probability.However,it remains notable that in many cases,research has shown that minimal information is sufficient to mount attacks.x Use federated learning to minimize risk of data breaches Federated learning is a set of training techniques that trains a model on several decentraliser servers containing local data samples,without exchanging their data samples.This avoids the need to transfer the data and/or entrust it to an untrusted third party and thus helps to preserve the privacy of the data.x x Use less easily transferable models20 The transferability property can be used to force adversarial examples from a substitution model to evade another.The ease of transferring an adversarial example from a model to another depends on the family of algorithms.One possible defense is thus to choose an algorithm family that is less sensitive to the transferability of adversarial examples.x 19 It is important to keep in mind that,in case of attacks like evasion or oracle,this security control can help.However,in some cases,it may possible to bypass the security control by using more queries.20 Some evasion attacks are based on the following principle:train a model with data like the target model used and generate adversarial examples from this model.Then,present these adversarial examples to the target model to perform an evasion attack.Whether or not to transfer an adversarial example generated by one model to another depends on their respective design as shown in the reference 215.SECURING MACHINE LEARNING ALGORITHMS December 2021 26 5.CONCLUSION Machine Learning algorithms are at the core of modern AI systems and applications.However,they are faced with a series of threats and vulnerabilities.In this report we have identified multiple security controls that can be applied to ML applications to address the threats they face.Some of the security controls are specific to ML algorithms,but others are standard technical and organisational cybersecurity controls to mitigate general attacks.It is important to apply both types of controls because AI systems,in addition to ML specific vulnerabilities,there exist also general type of vulnerabilities,which may also be exploited by adversaries.Mitigation controls for ML-specific attacks outlined in the report should in general be deployed during the entire lifecycle of the ML system.This includes measures for assuring the data quality and protecting its integrity,making the ML algorithms more robust and controlling access to both the model and the data to ensure their privacy.The report also emphasizes the need for the explainability of decisions,and the importance of detecting bias that can be present or injected in a model by an attacker,which can then lead to unethical uses of AI.An important point highlighted in the report is that the identified security measures can be applied to all algorithms.Nevertheless,their operational implementations(see Annex C)may be specific to certain types of algorithms.For example,for the security control“Choose and define a more resilient model design”,the defensive distillation implementation is specific to neural networks.It is also notable that with the prevalence of research papers on supervised learning,there are more examples of operational implementations for this type of algorithms.This report addresses an emerging subject.Thus,it remains very important to keep an active watch on threats and security controls in the field of ML in order to understand the latest innovations both from a technical point of view,or with a view to comply with standards provided by ISO,IEEE and ETSI21.When looking ahead and given the complexity of the issue of securing ML,companies and governments have new responsibilities.For instance,it is increasingly important to raise cybersecurity awareness within companies,especially regarding the security of ML systems.For some populations,particularly data science teams,cybersecurity has not been at the forefront for many years.Moreover,by including data science actors in these actions,they are also given the opportunity to think of innovative solutions to mitigate the various threats.Thus,to this end,training and education programs should be organised regularly and the vulnerabilities of ML should be demonstrated using concrete examples.Finally,the context in which security controls are applied is crucial and specific use cases should be considered when conducting targeted risk assessments.All mitigations used should be proportional to the application-specific threat level and consider specific conditions of the environment that may either favor or hamper attacks.Moreover,defenders should be aware of the following points:1)There is no silver bullet for mitigating ML-specific attacks.Some security controls may be bypassed by adaptive attackers.However,applied mitigations can still raise the bar for attackers.21 https:/www.etsi.org/committee/sai There is no silver bullet for ML-specific attacks,but mitigation measures can still raise the bar for attackers.Thus,more attention should be given to security controls to enable comparability and increase resilience.SECURING MACHINE LEARNING ALGORITHMS December 2021 27 2)ML-specific mitigation controls are not generally evaluated in a standardised way even if it is a current and important issue to enable comparability.More research should be devoted to standardised benchmarks for comparing ML-specific mitigations on a level playing field.These benchmarks should also be enforced to ensure that the methods used in practice are the ones that perform best.3)Deploying security controls often leads to a trade-off between security and performance and this is a topic of particular importance that should be further pursued by the research and cybersecurity communities.SECURING MACHINE LEARNING ALGORITHMS December 2021 28 A ANNEX:TAXONOMY OF ALGORITHMS Algorithm Name Definition Main domain Data type Data environments Learning Paradigm Explainability Accuracy Provided Refs AdaBoost AdaBoost uses multiple iterations to generate a single composite strong learner by iteratively adding weak learners.During each phase of training,a new weak learner is added to the ensemble,and a weighting vector is adjusted to focus on examples that were misclassified in previous rounds.Classic Data Science Structured data Supervised learning Classification,Regression Globally Explainable 38 Adam optimisation Adam optimisation is an extension to Stochastic gradient decent and can be used in place of classical stochastic gradient descent to update network weights more efficiently,thanks to two methods:adaptative learning rate and momentum Classic Data Science Structured data/Optimisation 24 Agglomerative clustering Agglomerative clustering is a bottom-up approach of hierarchical clustering.Each observation starts in its own cluster,and pairs of clusters are merged as one moves up the hierarchy.Classic Data Science Structured data Unsupervised Learning Clustering 32 ARMA/ARIMA model Given a time series Xt,the ARMA/ARIMA model is a tool to understand and predict the future values of this series.The model is composed of two parts:an autoregressive part(AR)and a moving average part(MA)Classic Data Science Time series Supervised learning Regression Fully Explainable 136 BERT Bidirectional Encoder Representations from Transformers(BERT)is a Transformer-based ML technique for natural language processing(NLP)pre-training developed by Google.NLP&Speech processing Text Supervised learning Classification Not Explainable Yes 5 Convolutional Neural Network A Convolutional Neural Network is a deep learning algorithm which can take in an input,assign importance(learnable weights and biases)to various aspects/objects in the data and be able to differentiate one from the other.Computer Vision,NLP&Speech processing Image,video,text,time series Supervised learning Classification Not Explainable Yes 16,22,36,43,49,50,56,58,59;64,67,68,69,70,82,89,103,124,161 SECURING MACHINE LEARNING ALGORITHMS December 2021 29 Algorithm Name Definition Main domain Data type Data environments Learning Paradigm Explainability Accuracy Provided Refs DBSCAN DBSCAN-Density-Based Spatial Clustering of Applications with Noise is a density-based clustering non-parametric algorithm:given a set of points in some space,it groups together points that are closely packed together(points with many nearby neighbours),marking as outliers points that lie alone in low-density regions(whose nearest neighbours are too far away).Computer Vision Image Unsupervised Learning Clustering 26,129,142 Decision tree A decision tree is a graph that uses a branching method to illustrate every possible output for a specific input in order to break down complex problems.Classic Data Science Structured data Supervised learning Classification,Regression Fully Explainable 40,42,120,Deep Q-learning Deep Q-learning works as Q-learning algorithm at the difference that it uses a neural network to approximate the Q-value function to manage big amount of states and actions.Classic Data Science Time series Reinforcement learning Rewarding Yes 65,85 EfficientNet EfficientNet is a Convolutional Neural Network based on depth wise convolutions,which makes it lighter than other CNNs.It also allows to scale the model with a unique lever:the compound coefficient.Computer Vision Image Supervised learning Classification Not Explainable Yes 4 Factor analysis of correspondences The factorial correspondence analysis(CFA)is a statistical method of data analysis which allows the analysis and prioritisation of the information contained in a rectangular table of data and which is today particularly used to study the link between two variables(qualitative or categorical).Classic Data Science Structured data Unsupervised Learning Dimension Reduction GAN A GAN is a generative model where two networks are placed in competition.The first model is the generator,it generates a sample(e.g.an image),while its opponent,the discriminator,tries to detect whether a sample is real or whether it is the result of the generator.Both improve on the performance of the other.Computer Vision Image,Video Unsupervised Learning 53,135 SECURING MACHINE LEARNING ALGORITHMS December 2021 30 Algorithm Name Definition Main domain Data type Data environments Learning Paradigm Explainability Accuracy Provided Refs GMM A Gaussian mixture model is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters.Computer Vision,NLP&Speech processing Text,time series,Image,video,Unsupervised Learning Clustering 31,131 GPT-3 Generative Pre-trained Transformer 3(GPT-3)is an autoregressive language model that uses deep learning to produce human-like text.NLP&Speech processing Text Supervised learning Classification Not Explainable Yes 6 Gradient boosting machine Gradient boosting is a technique that optimises a decision tree by combining weak models to improve model prediction.Classic Data Science Structured data Supervised learning Classification,Regression Globally Explainable 3,51,54,55,140 Gradient descent Gradient descent is a first-order iterative optimisation algorithm for finding a local minimum of a differentiable function.The idea is to take repeated steps in the opposite direction of the gradient(or approximate gradient)of the function at the current point,because this is the direction of steepest descent.Classic Data Science Structured data/Optimisation 17 Graph neural networks(GNNs)Graph neural networks(GNNs)are deep learning-based methods that operate on graph domain.Graphs are a kind of data structure which models a set of objects(nodes)and their relationships(edges)Computer Vision,Speech processing Image Supervised learning Regression,classification 20 Hierarchical clustering Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters.The result is a tree-based representation of the objects,named a dendrogram.Classic Data Science Structured data Unsupervised Learning Clustering 32 Hidden Markov Model(HMM)Hidden Markov Model is a statistical Markov model in which the system being modelled is assumed to be a Markov process with unobservable hidden states.Structured data,NLP&Speech processing Structured data,time series,text Reinforcement learning Rewarding Yes 29 Independent component analysis ICA is a special case of blind source separation.A common example application is the cocktail party problem of listening in on one persons speech in a noisy room.Classic Data Science Structured data Unsupervised Learning Dimension Reduction 2 SECURING MACHINE LEARNING ALGORITHMS December 2021 31 Algorithm Name Definition Main domain Data type Data environments Learning Paradigm Explainability Accuracy Provided Refs Isolation forest The isolation forest returns the anomaly score of each sample.It isolates observations by randomly selecting a feature,and then randomly selecting a split value between the maximum and minimum values of the selected feature.Classic Data Science Structured data Unsupervised learning Anomaly detection 157,161 K-means K-means clustering is a method of vector quantification that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean(cluster centres or cluster centroid),serving as a prototype of the cluster.Classic Data Science Structured data Unsupervised Learning Clustering 129 K-Nearest Neighbour K-Nearest Neighbour is a simple algorithm that stores all the available cases and classifies the new data or case based on a similarity measure.It is mostly used to classify a data point based on how its neighbours are classified.Classic Data Science Structured data Supervised learning Classification Fully Explainable Yes 21,40,Linear regression Linear regression attempts to model the relationship between two or more variables by fitting a linear equation to observed data.One variable is considered to be an explanatory variable,and the other is considered to be a dependent variable.For example,a modeller might want to relate the weights of individuals to their heights using a linear regression model.Classic Data Science Structured data Supervised learning Regression Fully Explainable 2,117,221 Logistic regression Logistic regression is used to classify data by modelling the probability of a certain class or event existing such as pass/fail,win/lose,alive/dead or healthy/sick.This can be extended to model several classes of events such as determining whether an image contains a cat,dog,lion,etc.Classic Data Science Structured data Supervised learning Classification Fully Explainable Yes 2,120,177 LSTM Long short-term memory(LSTM)is an artificial recurrent neural network(RNN)architecture used in the field of deep learning.Unlike standard feedforward neural networks,LSTM has feedback connections.It cannot only process single data points(such as images),but also entire sequences of data(such as speech or video).NLP&Speech processing,computer vision Text,image,video Supervised learning Regression Not Explainable 22,44,45,46,47,50,84,85,131,158,161 SECURING MACHINE LEARNING ALGORITHMS December 2021 32 Algorithm Name Definition Main domain Data type Data environments Learning Paradigm Explainability Accuracy Provided Refs Mean shift Mean shift is a non-parametric feature-space analysis technique for locating the maxima of a density function,a so-called mode-seeking algorithm Computer Vision Image,video Unsupervised Learning Clustering 27 MobileNet MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions instead of convolutions,in order to build light wFeight deep neural networks.Computer Vision,Classic Data Science,NLP&Speech processing Image,video,text,time series,structured data Unsupervised learning Clustering Yes 4 Monte Carlo algorithm A Monte Carlo algorithm is a randomised algorithm whose output may be incorrect with a certain(typically small)probability.Classic Data Science Structured data Reinforcement learning Rewarding 30,105 Multimodal Parallel Network A Multimodal Parallel Network helps to manage audio-visual event localisation by processing both audio and visual signals at the same time.Computer Vision,Speech processing Video Supervised learning Classification 18 Naive Bayes classifiers Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes theorem with strong(nave)independence assumptions between the features.Classic Data Science Structured data Supervised learning Classification Fully Explainable Yes 39,40,89,120,210 Proximal Policy Optimisation A family of policy gradient methods for Reinforcement Learning that alternate between sampling data and optimising a surrogate objective function using stochastic gradient ascent.Classic Data Science Structured data,time series Reinforcement learning Rewarding Yes 137 Principal Component Analysis The main idea of principal component analysis(PCA)is to reduce the dimensionality of a data set consisting of many variables correlated with each other,either heavily or lightly,while retaining the variation present in the dataset,up to the maximum extent.Classic Data Science Structured data Unsupervised Learning Dimension Reduction 2 Q-learning Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state.It does not require a model of the environment.Classic Data Science Structured data,time series Reinforcement learning Rewarding Yes 28 Random forests Random forests are an ensemble learning method that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes(classification)or mean/average prediction(regression)of the individual trees.Classic Data Science Structured data Supervised learning Classification,Regression Globally Explainable 51,136,140 SECURING MACHINE LEARNING ALGORITHMS December 2021 33 Algorithm Name Definition Main domain Data type Data environments Learning Paradigm Explainability Accuracy Provided Refs Recurrent neural network A recurrent neural network(RNN)is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence.This allows it to exhibit temporal dynamic behaviour.Computer Vision,NLP&Speech processing Time series,text,image,video Supervised learning Regression Not Explainable 14,17,44,45,46,47,49,50,52,89,13 ResNet A residual neural network(ResNet)is an artificial neural network(ANN)that builds on constructs known from pyramidal cells in the cerebral cortex by utilising skip connections,or shortcuts to jump over some layers.Computer Vision Image Supervised learning Classification Not Explainable Yes 4,7,37 Spatial Temporal Graph Convolutional Networks Spatial Temporal Graph Convolutional Networks is a convolutional neural network that automatically learns both the spatial and temporal patterns from data.Computer Vision Video Supervised learning Classification 25 Stochastic gradient descent Stochastic gradient descent is an iterative method for optimising an objective function with suitable smoothness properties.It can be regarded as a stochastic approximation of gradient descent optimisation,since it replaces the actual gradient(calculated from the entire data set)by an estimate thereof(calculated from a randomly selected subset of the data).Classic Data Science Structured data/Optimisation 17,24 Support vector machine SVM are linear classifiers which are based on the margin maximisation principle.They accomplish the classification task by constructing,in a higher dimensional space,the hyperplane that optimally separates data into two categories.Classic Data Science Structured data Supervised learning Classification Fully Explainable Yes 42,47,51,67,69,87,89,92,98,106,120,136,139,142,152,177,185 WaveNet Wavenet is a deep neural network for generating raw audio waveforms.The model is fully probabilistic and autoregressive,with the predictive distribution for each audio sample conditioned on all previous ones NLP&Speech processing Time series Unsupervised learning NLP task 44,45,131,132 XGBoost XGBoost is an extension to gradient boosted decision trees(GBM)and specially designed to improve speed and performance by using regularisation methods to fight overfitting.Classic Data Science Structured data Supervised learning Classification,Regression Globally Explainable 3 SECURING MACHINE LEARNING ALGORITHMS December 2021 34 B ANNEX:MAPPING SECURITY CONTROLS TO THREATS Threats|sub-threats Vulnerabilities Security Controls Threats references Evasion Lack of detection of abnormal inputs Implement tools to detect if a data point is an adversarial example or not 13,34,37,48,49,51,53,56,59,60,62,65,66,67,73,80,81,82,83,84,90,95,97,100,107,109,110,121,125,139,144,154,155,162,163,169,170,175,181,183,185,199,200,201,202,204,205,206,207,209,211,213,215 Include ML applications in detection and response to security incident processes Poor consideration of evasion attacks in the model design implementation Choose and define a more resilient model design Lack of consideration of attacks to which ML applications could be exposed Integrate ML specificities to awareness strategy and ensure all ML stakeholders are receiving it Lack of training based on adversarial attacks Add some adversarial examples to the training dataset Use a widely known model allowing the attacker to study it Lack of security process to maintain a good security level of the components of the ML application Use less easily transferable models Assess the exposure level of the model used Inputs totally controlled by the attacker which allows for input-output-pairs Apply modifications to inputs Use of adversarial examples crafted in white or grey box conditions(e.g.FGSM)Too much information available on the model Reduce the available information about the model 34,35,48,51,56,59,60,62,65,80,81,82,100,109,110,125,139,144,154,170,204,209 Too much information about the model given in its outputs Reduce the information given about the model Oracle Poor access rights management Apply a RBAC model,respecting the least privileged principle 121,145,146,152,170,177,194,203,204,208,214 The model allows private information to be retrieved Ensure that models respect differential privacy Too much information about the model given in its outputs Reduce the information given about the model Too much information available on the model Reduce the available information about the model Lack of consideration of attacks to which ML applications could be exposed to Integrate ML specificities to awareness strategy and ensure all ML stakeholders are receiving it Lack of security process to maintain a good security level of the components of the ML application Implement processes to maintain security levels of ML components over time Weak access protection mechanisms for ML model components Ensure ML applications comply with identity management,authentication,and process control policies SECURING MACHINE LEARNING ALGORITHMS December 2021 35 Threats|sub-threats Vulnerabilities Security Controls Threats references Poisoning Model easy to poison Choose and define a more resilient model design 74,77,79,99,114,115,116,117,118,121,126,140,142,143,162,167,170,171,172,173,189,196,197,198,199,204,210 Implement processes to maintain security levels of ML components over time Assess the exposure level of the model used Lack of data for increasing robustness to poisoning Enlarge the training dataset Poor access rights management Apply a RBAC model,respecting the least privileged principle Poor data management Ensure ML applications comply with data security requirements Undefined indicators of proper functioning,making complex compromise identification Define and monitor indicators for proper functioning of the model Lack of consideration of attacks to which ML applications could be exposed to Integrate ML specificities to awareness strategy and ensure all ML stakeholders are receiving it Use of uncontrolled data Control all data used by the ML model Use of unsafe data or models(e.g with transfer learning)Ensure reliable sources are used Lack of control for poisoning Integrate poisoning control after the model evaluation phase No detection of poisoned samples in the training dataset Use methods to clean the training dataset from suspicious samples Weak access protection mechanisms for ML model components Ensure ML applications comply with identity management,authentication,and access control policies Label modification Use of unreliable source to label data Ensure reliable sources are used 125,140,204 Model or data disclosure Poor access rights management Apply a RBAC model,respecting the least privileged principle 121,194,221,222 Existence of unidentified disclosure scenarios Conduct a risk analysis of the ML application Weak access protection mechanisms for ML model components Ensure ML applications comply with identity management,authentication,and access control policies Lack of security process to maintain a good security level of the components of the ML application Implement processes to maintain security levels of ML components over time Unprotected sensitive data on test environments Ensure appropriate protection are deployed for test environments as well Data disclosure Too much information about the model given in its outputs Integrate ML specificities to awareness strategy and ensure all ML stakeholders are receiving it The model can allow private information to be retrieved Ensure that models respect differential privacy The model can allow private information to be retrieved Reduce the information given by the model Disclosure of sensitive data for ML algorithm training Use federated learning to minimise the risk of data breaches Model disclosure Too much information available on the model Reduce the available information about the model Too much information about the model given in its outputs Reduce the information given by the model SECURING MACHINE LEARNING ALGORITHMS December 2021 36 Threats|sub-threats Vulnerabilities Security Controls Threats references Compromise of ML application components Poor access rights management Apply a RBAC model,respecting the least privileged principle 121,164,183,189 Too much information available on the model Reduce the available information about the model Existence of several vulnerabilities because the ML application was not integrated into process for integrating security into projects Ensure ML projects follow the global process for integrating security into projects Use of vulnerable components(among the whole supply chain)Check the vulnerabilities of the components used so that they have an appropriate security level Too much information about the model given in its outputs Reduce the information given by the model Existence of unidentified compromise scenarios Conduct a risk analysis of the ML application Undefined indicators of proper functioning,making complex compromise identification Define and monitor indicators for proper functioning of the model Bad practices due to a lack of cybersecurity awareness Integrate ML specificities to awareness strategy and ensure all ML stakeholders are receiving it Lack of security process to maintain a good security level of the components of the ML application Implement processes to maintain security levels of ML components over time Ensure ML applications comply with protection policies and are integrated to security operations processes Weak access protection mechanisms for ML model components Ensure ML applications comply with identity management,authentication,and access control policies Existence of several vulnerabilities because ML specificities are not integrated to existing policies Integrate ML specificities to existing security policies Existence of several vulnerabilities because ML application do not comply with security policies Ensure ML applications comply with security policies Include ML applications into asset management processes Contract with a low security third party Ensure ML applications comply with third parties security requirements Failure or malfunction of ML application Existing biases in the ML model or in the data Ensure that models are unbiased 121,164,183,189,191 Lack of consideration of real-life conditions in training the model Ensure that the model is sufficiently resilient to the environment in which it will operate.ML application not integrated in the cyber-resilience strategy Integrate ML applications into the overall cyber-resilience strategy Existence of unidentified failure scenarios Conduct a risk analysis of the ML application Undefined indicators of proper functioning,making complex malfunction identification Define and monitor indicators for proper functioning of the model Lack of explainability and traceability of decisions taken Build explainable models Lack of security process to maintain a good security level of the components of the ML application Implement processes to maintain security levels of ML components over time SECURING MACHINE LEARNING ALGORITHMS December 2021 37 Threats|sub-threats Vulnerabilities Security Controls Threats references Existence of several vulnerabilities because ML specificities are not integrated to existing policies Ensure ML projects follow the global process for integrating security into projects Contract with a low security third party Ensure ML applications comply with third parties security requirements Application not compliant with applicable regulations Assess the regulations and laws the ML application must comply with Human error Poor access rights management Apply a RBAC model,respecting the least privilege principle Lack of documentation on the ML application Apply documentation requirements to AI projects Include ML applications into asset management processes Denial of service due to inconsistent data or a sponge example Use of uncontrolled data Control all data used by the ML model Cybersecurity incident not reported to incident response teams Lack of cybersecurity awareness Integrate ML specificities to awareness strategy and ensure all ML stakeholders are receiving it SECURING MACHINE LEARNING ALGORITHMS December 2021 38 C ANNEX:IMPLEMENTING SECURITY CONTROLS Security controls Examples for operational implementation References Add some adversarial examples to the training dataset The literature provides the following techniques:-Adversarial Training-Ensemble Adversarial Training-Cascade Adversarial Training-Principled Adversarial Training-Gradient Band Based Adversarial Training 13,23,48,51,59,65,72,95,108,162,200,201,202,211,215 Apply a RBAC model,respecting the least privilege principle The NIST 800-53 and the ISO 27001/2 provides several points:-Manage access permissions and authorisations,incorporating the principles of least privilege and separation of duties-Manage the identity of the users(Couple lifecycle management processes and procurement processes etc.)ISO 27001/2 NIST 800-53,162 Apply documentation requirements to Artificial Intelligence projects The NIST 800-53 and the ISO 27001/2 provides several points:-Define change management processes,integrating the update of the documentation 148 ISO 27001/2 NIST 800-53 Apply modifications on inputs The literature provides the following techniques:-Data randomisation-Input transformation-Input denoising 64,65,108,208 Assess the exposure level of the model used Assess the regulations and laws the ML application must comply with The NIST 800-53 and the ISO 27001/2 provides several points:-Identify applicable legislation-Meet the requirements of GDPR for personal data ISO 27001/2 NIST 800-53,162 Build explainable models The literature provides the following techniques:-Interpret models with some tools-Use model more explainable like regression instead of Deep Neural Network for supervised learning when it is necessary 164,225,226,227 SECURING MACHINE LEARNING ALGORITHMS December 2021 39 Security controls Examples for operational implementation References Check the vulnerabilities of the components used so that they have an appropriate security level The NIST 800-53 and the ISO 27001/2 provides several points:-Manage the exemptions by following it industrially,also including remediation plans-Make an inventory of the infrastructure equipment,the applications(Define,document,improve and review a regular process to make inventory)-Manage the maintenance,the obsolete assets etc.(Define a process and continuously improve it,define a roadmap to replace obsolescent technologies)-Implement a vulnerability management policy(Control regularly its implementation)-Perform and manage vulnerability scans on servers OS,middleware,database and network infrastructure(Perform regularly automatics scans)ISO 27001/2 NIST 800-53 Choose and define a more resilient model design For poisoning,the literature provides the following technique:-Bagging or weight Bagging-TRIM algorithm For evasion,the literature provides the following technique:-Randomisation-Stability terms into objective function-Adversarial perturbation-based regulariser-Input gradient regularisation-Defensive distillation-Random feature nullification 35,65,108,110,112,113,114,143,196,206,213 Conduct a risk analysis of the ML application The NIST 800-53 and the ISO 27001/2 provides several points:-Coordinate the compliance process with legal and audit functions-Identify legal requirements(e.g.GDPR or NIS for European countries)-Establish a methodology to manage identified risks-Establish a formal methodology to analyse risk-Define and monitor the IT resource availability(Formalise a capacity management plan)ISO 27001/2 NIST 800-53 Control all data used by the ML model The literature provides the following techniques:-Data sanitisation-RONI and tRONI technics-Point out important data and put a human in the loop(Human in the loop)114,116,118,142,162,197,228 Define and monitor indicators for proper functioning of the model The NIST 800-53 and the ISO 27001/2 provides several points:-Formalise a dashboard,bringing together a series of indicators enabling the state of the information system to be judged in relation to the objectives set.-Take actions in case of deviation from the objective-Manage changes on assets,ensure changes will not impact the production and detect any changes in assets baseline configuration-Guarantee the integrity of the code at all stage(Perform integrity control etc.)ISO 27001/2 NIST 800-53 SECURING MACHINE LEARNING ALGORITHMS December 2021 40 Security controls Examples for operational implementation References Enlarge the training dataset The literature provides the following technique:-Data augmentation 68,150 Ensure appropriate protection are deployed for test environments as well The NIST 800-53 and the ISO 27001/2 provides several points:-Protect data when there are in non-production environment(implement desensitisation measures etc.)ISO 27001/2 NIST 800-53 Ensure that the model is sufficiently resilient to the environment in which it will operate.Ensure ML applications comply with Data Security requirements The NIST 800-53 and the ISO 27001/2 provides several points:-Apply a methodology for data classification.Review the classification regularly-Implement measures to detect data leakage on the Internet(Antivirus,Data right management solution on all sensitive folders)-Secure sensitive data in transit(Deploy mechanisms to detect bypasses on all networks)-Deploy security solutions on network points to prevent data leaks(DLP etc.)ISO 27001/2 NIST 800-53 Ensure ML applications comply with identity management,authentication and access control policies The NIST 800-53 and the ISO 27001/2 provides several points:-Define a policy regarding users authentication(Define an authentication policy that considers the sensitivity of resources and the connection context for all types of account,use Multi-Factor Authentication)-Define a remote access policy(Verify security configuration,authenticate connected devices)ISO 27001/2 NIST 800-53,162 Ensure ML applications comply with protection policies and are integrated to security operations processes The NIST 800-53 and the ISO 27001/2 provides several points:-Manage the maintenance,the obsolete assets etc.(Define a process and continuously improve it,define a roadmap to replace obsolescent technologies)-Implement a vulnerability management policy(Control regularly its implementation)-Perform and manage vulnerability scans on servers OS,middleware,database and network infrastructure(Perform regularly automatics scans)ISO 27001/2 NIST 800-53 Ensure ML applications comply with security policies The NIST 800-53 and the ISO 27001/2 provides the following point:-Define policies for information security ISO 27001/2 NIST 800-53 Ensure ML applications comply with third parties security requirements The NIST 800-53 and the ISO 27001/2 provides several points:-Integrate the security into contracts(Define a security insurance plan for strategic contracts representing high risks for company security.Communicate roles and responsibilities to every new third party)-Monitor and review third parties services ISO 27001/2 NIST 800-53 SECURING MACHINE LEARNING ALGORITHMS December 2021 41 Security controls Examples for operational implementation References Ensure ML projects follow the global process for integrating security into projects The NIST 800-53 and the ISO 27001/2 provides several points:-Integrate the security into contracts(Define a security insurance plan for strategic contracts representing high risks for company security.Communicate roles and responsibilities to every new third party)-Define and manage a patch management policy-Manage the interconnections with external systems(establish a formal process to regularly review the exhaustiveness of interconnections inventory)-Integrate and manage security protection for applications(firewalls,WAF,reverse proxy)-Perform security controls on application ISO 27001/2 NIST 800-53 Ensure reliable sources are used The NIST 800-53 and the ISO 27001/2 provides several points:-Manage the interconnections with external systems(establish a formal process to regularly review the exhaustiveness of interconnections inventory)ISO 27001/2 NIST 800-53 Ensure that models are unbiased The literature provides the following techniques:-Classification parity-Calibration-Anti-classification-Having a diverse dataset-Some other technics:samples bias,measurement error 217,229 Ensure that models respect differential privacy to a sufficient degree The literature provides the following techniques:-Model design adapted like PATE for Deep Neural Network based classifier-Data randomisation-Randomisation-Objective function perturbation 63,108,112,194,203,208,214 Implement processes to maintain security levels of ML components over time The NIST 800-53 and the ISO 27001/2 provides the following point:-Perform technical and organisational audits regularly on critical scope and develop an action plan after each audit ISO 27001/2 NIST 800-53 Implement tools to detect if a data point is an adversarial example or not The literature provides the following technique:-Adding detector subnetworks 65,207 Include ML applications into asset management processes The NIST 800-53 and the ISO 27001/2 provides several points:-Create an inventory of the infrastructure equipment,the applications(Define,document,improve and review a regular process to make inventory)-Classify information-Manage the maintenance,the obsolete assets etc.(Define a process and continuously improve it,define

3人已浏览

2023-03-06 70页

5星级

【本文地址】

公司简介

联系我们