May 2021


Trust, alongside performance, is rapidly becoming a key operational requirement of AI models. Our research will focus on two key dimensions of trust:

  1. Explainability. The superior performance of AI models usually comes at the cost of model opacity, where humans cannot understand how decisions were made by the models without clear interpretations. Concerns over the black-box nature of AI hampers the further use of AI in mission-critical applications, such as finance and healthcare. High-quality explanations will enable us to understand and audit the AI models. However, most current explanation methods lack theoretical supports to generate faithful, informative, and concise explanations. Towards this end, we will research into explanations from three theoretical perspectives:

    i) Information theory to exhibit how information flows along with model architectures;
    ii) Game theory to identify the coalitions as prototypes or patterns memorized in the models; and
    iii) Causality to distinguish the causal and non-causal effects on explanations.

    Moreover, we propose using additional metrics (e.g., fidelity, accuracy, contrastivity, sanity checks) and visual inspections to quantitatively and qualitatively evaluate the quality of the generated explanations.

  2. Robustness. AI models are usually fragile and vulnerable to adversarial samples (e.g., image perturbations, malicious behaviors), which act as attacks to influence the models negatively and cause privacy leakage. This poses real threats to the deployment of AI models. Hence, it is essential to enhance the robustness of AI models against such adversarial attacks. Towards this end, the key research tasks to be carried out are:
    i) Adversarial attack to discover the drawbacks or backdoors of AI models, investigate the failure cases, and quantify their vulnerability;
    ii) Adversarial defense to learn how to identify and mitigate the negative influence of attacks.

For this research, we will build a framework of AI accountability, which contains a set of novel technologies and models to inspire trust and help to audit the AI models and results. Taking event detection as an example, our framework will:
(1) audit the performance of the models by returning the explanations on why the detected events are similar to the target event;
(2) provide early detection and monitoring for emerging events by highlighting the causation of detection results;
(3) avoid model failure and event misdetection by ensuring stability against attacks.

As a result, our framework will benefit real-world applications in Fintech, E-Commerce, and Healthcare domains.


Chen, J., Song, L., Wainwright, M. J., & Jordan, M. I. “Learning to explain: An information-theoretic perspective on model interpretation”. In ICML, 2018.

Goodfellow, I. J., Shlens, J., & Szegedy, C. “Explaining and harnessing adversarial examples”. In Arxiv, 2014.

Kilbertus, N., Ball, P. J., Kusner, M. J., Weller, A., & Silva, R. “The sensitivity of counterfactual fairness to unmeasured confounding”. In UAI, 2019.

Moraffah, R., Karami, M., Guo, R., Raglin, A., & Liu, H. Causal interpretability for machine learning-problems, methods and evaluation. In KDD, 2020.

Zhang, Q., Nian Wu, Y., & Zhu, S. C. “Interpretable convolutional neural networks”. In CVPR, 2018.