Visual Interpretability for Deep Learning: a Survey

Visual Interpretability for Deep Learning: a Survey

Quanshi Zhang and Song-Chun Zhu
Frontiers of Information Technology & Electronic Engineering, Vol.19 No.1 page 27-39, 2018

You can download the paper here.

If you hope to recommend new papers or have any questions, please contact Dr. Quanshi Zhang.
We would like to follow your suggestions to update the arXiv paper.

Abstract......This paper reviews recent studies in understanding neural-network representations and learning neural networks with interpretable/disentangled middle-layer representations. Although deep neural networks have exhibited superior performance in various tasks, the interpretability is always the Achilles’ heel of deep neural networks. At present, deep neural networks obtain high discrimination power at the cost of low interpretability of their black-box representations. We believe that high model interpretability may help people to break several bottlenecks of deep learning, e.g. learning from very few annotations, learning via human-computer communications at the semantic level, and semantically debugging network representations. We focus on convolutional neural networks (CNNs), and we revisit the visualization of CNN representations, methods of diagnosing representations of pre-trained CNNs, approaches for disentangling pre-trained CNN representations, learning of CNNs with disentangled representations, and middle-to-end learning based on model interpretability. Finally, we discuss prospective trends in explainable artificial intelligence.

Survey . In this paper, we conduct a survey of current studies in understanding neural-network representations and learning neural networks with interpretable/disentangled representations. We can roughly define the scope of the review into the following five research directions.

  1. Visualization of CNN representations in intermediate network layers. These methods mainly synthesize the image that maximizes the score of a given unit in a pre-trained CNN or invert feature maps of a conv-layer back to the input image.
  2. Diagnosis of CNN representations. Related studies may either diagnose a CNN’s feature space for different object categories or discover potential representation flaws in conv-layers.
  3. Disentanglement of “the mixture of patterns” encoded in each filter of CNNs. These studies mainly disentangle complex representations in conv-layers and transform network representations into interpretable graphs.
  4. Building explainable models. We discuss interpretable CNNs, capsule networks, interpretable R-CNNs, and the InfoGAN.
  5. Semantic-level middle-to-end learning via human-computer interaction. A clear semantic disentanglement of CNN representations may further enable “middle-to-end” learning of neural networks with weak supervision.

Visualization of CNN representations (click to expand)

Visualization of filters in a CNN is the most direct way of exploring visual patterns hidden inside a neural unit. Different types of visualization methods have been developed for network visualization.


  1. Matthew D. Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In ECCV, 2014.
  2. Aravindh Mahendran and Andrea Vedaldi. Understanding deep image representations by inverting them. In CVPR, 2015.
  3. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: visualising image classification models and saliency maps. In arXiv:1312.6034, 2013.
  4. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: the all convolutional net. ICLR workshop, 2015.
  5. Chris Olah, Alexander Mordvintsev, and Ludwig Schubert. Feature visualization. Distill, 2017.
  6. Alexey Dosovitskiy and Thomas Brox. Inverting visual representations with convolutional networks. In CVPR, 2016.
  7. Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, and Jason Yosinski. Plug & play generative networks: Conditional iterative generation of images in latent space. CVPR, 2017.
  8. Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Object detectors emerge in deep scene cnns. In ICRL, 2015.
Diagnosis of CNN representations (click to expand)

Some methods go beyond the visualization of CNNs and diagnose CNN representations to obtain insight understanding of features encoded in a CNN. We roughly divide all relevant research into the following five directions.

  1. Studies in the first direction analyze CNN features from a global view.
  2. The second research direction extracts image regions that directly contribute the network output for a label/attribute to explain CNN representations of the label/attribute.
  3. The estimation of vulnerable points in the feature space of a CNN is also a popular direction for diagnosing network representations.
  4. The fourth research direction is to refine network representations based on the analysis of network feature spaces.
  5. Discovering potential, biased representations of a CNN.


  1. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In arXiv:1312.6199, 2014.
  2. Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? In NIPS, 2014.
  3. Yao Lu. Unsupervised learning on neural network outputs. In arXiv:1506.00990v9, 2015.
  4. Mathieu Aubry and Bryan C. Russell. Understanding deep features with computer generated imagery. In ICCV, 2015.
  5. Ruth C. Fong and Andrea Vedaldi. Interpretable explanations of black boxes by meaningful perturbation. In ICCV, 2017.
  6. Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In ICCV, 2017.
  7. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “why should i trust you?” explaining the predictions of any classifier. In KDD, 2016.
  8. Luisa M Zintgraf, Taco S Cohen Tameem Adel, and Max Welling. Visualizing deep neural network decisions: prediction difference analysis. In ICLR, 2017.
  9. Pieter-Jan Kindermans, Kristof T. Sch¨utt, Maximilian Alber, Klaus-Robert M¨uller, Dumitru Erhan, Been Kim, and Sven D¨ahne. Learning how to explain neural networks: Patternnet and patternattribution. In arXiv: 1705.05598, 2017.
  10. Devinder Kumar, AlexanderWong, and Graham W. Taylor. Explaining the unexplained: A class-enhanced attentive response (clear) approach to understanding deep neural networks. In CVPR Workshop on Explainable Computer Vision and Job Candidate Screening Competition, 2017.
  11. Peng Wang, Qi Wu, Chunhua Shen, and Anton van den Hengel. The vqa-machine: Learning how to use existing vision algorithms to answer new questions. In CVPR, 2017.
  12. Yash Goyal, Akrit Mohapatra, Devi Parikh, and Dhruv Batra. Towards transparent ai systems: Interpreting visual question answering models. In arXiv:1608.08974, 2016.
  13. Jiawei Su, Danilo Vasconcellos Vargas, and Sakurai Kouichi. One pixel attack for fooling deep neural networks. In arXiv:1710.08864, 2017.
  14. PangWei Koh and Percy Liang. Understanding black-box predictions via influence functions. In ICML, 2017.
  15. Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Eric Horvitz. Identifying unknown unknowns in the open world: Representations and policies for guided exploration. In AAAI, 2017.
  16. Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, and Eric P. Xing. Harnessing deep neural networks with logic rules. In ACL, 2016.
  17. Quanshi Zhang, Wenguan Wang, and Song-Chun Zhu. Examining cnn representations with respect to dataset bias. In AAAI, 2018.

Disentangling CNN representations into graphs/trees (click to expand)

Compared to the visualization and diagnosis of network representations in previous sections, disentangling CNN features into human-interpretable graphical representations or tree representations provides a more thorough explanation of network representations.


  1. Quanshi Zhang, Ruiming Cao, Ying Nian Wu, and Song-Chun Zhu. Growing interpretable part graphs on convnets via multi-shot learning. In AAAI, 2016.
  2. Quanshi Zhang, Ying Nian Wu, and Song-Chun Zhu. Interpretable convolutional neural network. In CVPR, 2018.
  3. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting cnn knowledge via an explanatory graph. In AAAI, 2018.
  4. Quanshi Zhang, Yu Yang, Ying Nian Wu, and Song-Chun Zhu. Interpreting cnns via decision trees. arXiv:1802.00121, 2018.

Learning networks with interpretable features (click to expand)

Almost all methods mentioned in previous sections focus on the understanding of a pre-trained network. Here, we review studies of learning disentangled representations of neural networks, where representations in middle layers are no longer a black box but have clear semantic meanings. Compared to the understanding of pre-trained networks, learning networks with disentangled representations present more challenges. Up to now, only a few studies have been published in this direction.


  1. Quanshi Zhang, Ying Nian Wu, and Song-Chun Zhu. Interpretable convolutional neural network. In CVPR, 2018.
  2. Tianfu Wu, Xilai Li, Xi Song, Wei Sun, Liang Dong, and Bo Li. Interpretable r-cnn. In arXiv:1711.05226, 2017.
  3. Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. Dynamic routing between capsules. In NIPS, 2017.
  4. Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In NIPS, 2016.

Interpretability-based middle-to-end learning (click to expand)

Since people may either disentangle representations of a pre-trained CNN or learn a new network with interpretable, disentangled representations. Such interpretable/disentangled network representations can further enable middle-to-end model learning at the semantic level without strong supervision.


  1. Quanshi Zhang, Ruiming Cao, Ying Nian Wu, and Song-Chun Zhu. Mining object parts from cnns via active question-answering. In CVPR, 2017.
  2. Quanshi Zhang, Ruiming Cao, Shengming Zhang, Mark Edmonds, Ying Nian Wu, and Song-Chun Zhu. Interactively transferring cnn patterns for part localization. In arXiv:1708.01783, 2017.
  3. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting cnn knowledge via an explanatory graph. In AAAI, 2018.