March 5, 2025

We Messed up the Ethical Foundations of AI.

Srikanth Bala
Meaning of Transparency, Predictability, and Robustness

Artificial intelligence is increasingly used for choices that shape people's lives, so the standards we ask of public institutions should guide these systems too. According to Bostrom and Yudkowsky in The Cambridge Handbook of Artificial Intelligence, three qualities keep such systems worthy of trust: transparency, predictability, and robustness.

What they mean by transparency and predictability is simple enough to test in real life. People should be able to see and anticipate how an AI decides, while robustness should act like a buttress against interference so those decisions are not easy to manipulate. Together these qualities show whether a system is fit to hold real authority rather than operate like an opaque tool.

Transparency, in their words, is practical auditability. Observers should be able to trace which inputs mattered, reconstruct a reasoning path, and identify who is responsible for outcomes instead of being told the model is neutral but inscrutable, as Bostrom and Yudkowsky put it.

Predictability is the ability for steady, precedent-like behavior so that similar cases receive similar treatment and citizens can plan in advance. They compare this to the stability of contract law and judicial rulings.

Robustness is the security stance of an AI. It should resist strategic attempts to fool it, like stickers that make a stop sign read as a speed limit sign, not just perform well on average test sets. People sometimes worry that openness makes manipulation easier. Bostrom and Yudkowsky answer by asking for clear logs, outside checks, and explanations at the right level so the system stays both accountable and strong.

Application to Deep Learning

Deep learning is a good test case because it learns patterns from massive data and then drives decisions in vision, speech, and medical tasks. Burgard, writing in The Cambridge Handbook of Responsible Artificial Intelligence, puts it this way, and the point lands when you look at how these models are used in hospitals and on the street.

Transparency here means more than opening code. It means a clear path from input to output, like feature attributions, example-based explanations, and decision logs that let an auditor see why two similar images or records led to different results, a move Bostrom and Yudkowsky keep insisting on.

Predictability asks for consistent behavior across time and context. A face model or a loan model should treat like cases alike after retraining and across updates so users can plan around it, much like they rely on steady rules in law.

Robustness matters because small, crafted changes can flip labels. Goodfellow and colleagues, in their work on adversarial examples, showed how tiny perturbations can push a classifier into error. The fix is to test against adversarial inputs and out-of-distribution shifts before deployment, not only on clean test sets. Put together, these standards turn deep learning from a black box classifier into a service people can inspect, anticipate, and trust when it moves into safety relevant settings like health, transport, or identity checks, as Burgard argues.

Ethical Risks in Deep Learning and Design Implications

To prevent the shortcomings above, require any high-stakes AI to ship with a versioned safety case that is independently audited before deployment and re-certified after material updates. NIST's AI Risk Management Framework points in this direction.

The safety case compiles three kinds of proof. First, practical auditability artifacts, like input salience, example-based explanations, and decision logs with accountability trails, so reviewers can trace why near-identical cases diverged, an idea straight from Bostrom and Yudkowsky. Second, predictability checks that demonstrate policy stability across retrains and contexts using holdout like-case suites and rollback plans. Third, robustness evidence from adversarial red teaming and out-of-distribution stress tests rather than averages on clean test sets, following Goodfellow's line of work.

An outside assessor evaluates the package against a public rubric and publishes a brief summary. If post-deployment monitoring finds drift, bias, or new attacks, certification pauses until the shortcomings are addressed and verified. This aligns with risk management practice and keeps openness calibrated, with explanations and logs at the right level for accountability without handing out exploit recipes. It also ties authority to observed behavior, not claims of neutrality, a theme you hear again in Bostrom and Yudkowsky and in Burgard's guidance. In short, no audited, up-to-date safety case, no production use.

Recent Posts