Speaker: Jeannette M Wing

Existing formal verification methods need to be adapted for AI.

Suggested model:

: model of data, e.g. stochastic process/distribution generating data

: model (i.e. code)

: probabilistic logic/interval analysis

: probabilistic/stochastic

The role of data:

- Collection and partitioning data

- Specifying "unseen" data

- What do we quantify over?

- How do we verify?

#### Specifying Unseen Data

How do we specigy the properties of the data?

We may know the distribution exactly, but unlikely not for real-world problems.

If we make assumptions/approximation - how does this differ from our model?

Approaches:

- Use repertoire of statistical tools

- Assume that initial specification is small & simple, and then use this to bootstrap an
*iterative refinement process*.

#### What do we quantify over?

In traditional formal methods we strive to prove that for some property and variable representing e.g. all params.

**But**for AI systems we don't expect to work for all input data.

Possible solutions:

- Restrict X to coming from class of distributions.
- E.g. for robustness, in the adversarial ML setting, we might want ot show that M is robust to all norm-bounded perturbations of D. Or even, we might want to show M is robest to all "semantic" or "structural" pertubations for the task at hand (e.g. CV).
- E.g. for fairness, we might want to show the model is fair on a given dataset and all unseen datasets that are "similar" (for some formal notion of "similar") → cites research indicating that small changes in a classifier can turn it from fair to unfair.

#### Robust by Construction

Add a noise layer a la differential privacy.