The Apple Card: Why enterprises need AI governance

Goldman Sachs is the latest in a series of high profile companies under scrutiny for developing AI algorithms with alleged implicit biases, such as gender or race discrimination. Do biases in these algorithms really exist? It’s certainly possible — yes — but because of the nature of the technology, we often can’t see into AI engines to know exactly what the drivers are behind certain outcomes. In this post we discuss why we continue to see bias issues within AI, what the “black box” is and how enterprises can employ AI governance and build trusted AI systems.

In this latest example, the Apple Card (issued by Goldman Sachs) appeared to be giving lower credit allowances to women, irrespective of credit rating. The concern surfaced in a string of recent tweets authored by Tech entrepreneur and Ruby on Rails founder, David Heinemeier Hansson, claiming that his wife had received a credit limit 20x lower than his, despite her more favourable credit score. Goldman defended itself by saying that its “credit decisions are based on a customer’s creditworthiness and not on factors like gender, race, age, sexual orientation or any other basis prohibited by law” but without being able to say more, it eroded trust in the application of algorithmic decision making.


How can bias creep into AI algorithms?

Machine learning is about learning from past data and enabling decision automation. As a technology, artificial intelligence is neutral in the sense that it is a mathematical tool devoid of the kind of prejudice and emotional blindspots that drive human bias. But it is also blind to the fact that it may be inappropriate to use race or gender as a basis for awarding credit; to the algorithm, it is just another variable. Algorithms are amoral — it is the organisations and human beings that apply AI that run the risk of immorality.

The danger with black box systems is that unintended biases can creep in and remain unverified. With machine learning systems, one source of bias is the data used to train the algorithms which may well be a reflection of societal prejudices inherited by the algorithm. For example, if you are training predictive policing algorithms based on the historical arrest and conviction patterns of a police force already rife with institutional bias, the algorithm is simply going to inherit the existing bias.

Furthermore, powerful machine learning algorithms may mine and pick out underlying proxies that are highly correlated with the prohibited variable in a dataset (i.e. gender and race), even if the sensitive variable is omitted. Let’s apply this to a bank trying to train a model to assess credit worthiness. The bank may indeed have removed gender from the training dataset but if it kept in “maiden name”, for example, then the algorithm could pick up non-empty values in this variable as a proxy for being female. This is why models need to be audited regularly for implicit biases.

In what ways are AI systems black boxes?

It has become almost trite to say that AI systems are black boxes. But there are many reasons why they are, or why they appear that way to enterprises and their customers.

Firstly, deep learning and other advanced machine learning algorithms can be difficult to understand, even to practitioners. The nonlinearity and sheer number of features driving the models can be pretty overwhelming. We can visualise 2 or 3 dimensions and how 2 or 3 features might interact to generate a prediction, but it becomes hugely complex when you reach a hundred (or a million) of these features.

But it is not inevitable that deep learning models are inexplicable — methods do exist to allow us to peer into these black boxes. The interactive visualisation Tensorflow Playground, gives you insight into what each neuron in the neural network is “thinking” and how those thoughts contribute to the final decision. Methods such as LIMEalso allow us to understand which pixels are driving the decision of computer vision algorithms when they classify objects.

Secondly, for commercial reasons, providers of AI systems may prefer them to be black boxes. There can often be a good reason for wanting to prevent a client from learning about how the system works, for example, protecting intellectual property or avoiding adversarial attacks on the AI. On the flip side, there may be no good reason at all, such as pretending that the black box is driven by sophisticated machine learning, when in fact it isn’t at all.

Thirdly, the absence of internal capability in enterprises can lead to black boxes. A solution provider can provide all the underlying code and data, but if the enterprise doesn’t have internal teams who can make sense of it, then it’s effectively a black box. It’s the lack of knowledge that creates the opacity.

The other phenomena we see is how, with internally developed solutions, the lack of the right process can lead to black boxes. AI scientists are often not trained as software engineers and are unfamiliar with checking their code into Git, which maintains robust version control of code. And to add to the complexity, machine learning engines have many more elements to keep track of than just code, not limited to: training data, model parameters, details about how the predictions are served and presented to other systems. Most organisations don’t have a systematic platform that helps maintain a robust digital audit trail and, more importantly, a process that enables error detection and the ability to take action to fix or rollback a model when mistakes happen.

Towards responsible artificial intelligence

In order to create trust in the use of AI systems, it is not enough to treat these systems as an ‘intelligent’ black box, satisfied with knowledge of the inputs and outputs of a machine learning model, without the internal workings. Robust AI engines require oversight, ongoing monitoring and compliance. Ultimately, it is up to humans to ensure that machine learning systems are put to good use and are accountable.

Cases such as the Apple Card are continually raising the profile of the issue surrounding big tech and AI governance. Fintech companies, among many other industries, are employing an increasing number of practical uses of machine learning, including credit risk and evaluation. But the need for governance and explainability is becoming more and more pertinent as consumers demand to know how and why “unfair” or incorrect decisions are being made about important things in their life such as credit worthiness.

‘AI governance’ might be becoming the next term on your buzzword bingo card but the reality is that business use of AI has grown by 270% in the last 4 years, at a much faster rate than the employment of strategies by companies to safeguard against machine learning mishaps. The term refers to the ability of an organisation to ensure that their decision-making algorithms work as intended, continuously and throughout their use.

For enterprises, this means:

  • Oversight: the ability to continuously track the performance of the entire multitude of AI systems on metrics that matter to the organisation, including accuracy which drives business outcomes, and bias which ensures ethical responsibility
  • Maintainability: the ability to identify when things go wrong, to be able to quickly diagnose and fix these mistakes
  • Auditability and explainability: the ability to analyse, trace and investigate drivers of how decisions are made
  • Enable trade-offs: Governance issues inevitably involve judgement, but a well-governed system has the means to surface key information to enable senior decision makers to make trade-offs in their decision making, and execute them across the organisation. For example, the ability to systematically generate fair, unbiased models, and quantify the fairness and resultant loss of profit.

So how can enterprises employ AI governance?

The good news is that governance is possible and the “black box” can be opened up. Plus, there are a number of ways in which companies can implement robust governance in their AI practises. We’ve summarised 4 key strategies to consider in responsible AI application:

1. Build core teams that can take ownership of your AI decision-making systems while retaining control on your data. Too many organisations see AI systems as a commodity or a widget that is universally applicable. But decision-making more often than not needs to be contextualised to your business. Be very careful when using pre-trained or automatically trained models because when their assumptions vary from your business context, this is where things go awry. We recommend you build a compact core data and machine learning team to drive your AI initiative; or, if you don’t have the right team, work with a team of experts who can help you build and maintain bespoke models and who are prepared to take a capability building approach, rather than just selling you a black box.

2. Use version control and digital audit trails. Make sure your systems enable version control and allow tracking of all parts of the AI development and deployment lifecycle, so you know which model was giving which results, and when. Had the Apple Card/Goldman situation been a result of a newly deployed AI algorithm, these organisations should have the right technology in place to investigate and switch back to an older version of the model in a few clicks. Digital audit trails make it easier to roll back, explain and debug.

3. Ongoing monitoring of AI in production. AI models degrade. If Apple Card’s algorithm was an old AI model that happened to degrade over time, monitoring is crucial in being able to flag up when the model performance was degrading. Make sure things are still working as intended when in production, and audit your models for implicit biases. Just removing race isn’t going to work if you have postal code as a field and certain postal codes have high correlation with race — the AI model will pick up postal code and this will become a proxy for race.

4. Invest in tools that help you to do all of the above. AI scientists and machine learning engineers need help to automate and govern their work. There’s also an increasing need to bridge the gap between data science and DevOps.

BasisAI’s platform, Bedrock, helps enterprises to enable better governed AI systems, while accelerating their machine learning efforts. If you want to learn more about AI governance, or Bedrock, get in touch with our team of data scientists at, or learn more.