North Carolina State University researchers have developed a new framework for building deep neural networks via grammar-guided network generators. In experimental testing, the new networks—called AOGNets—have outperformed existing state-of-the-art frameworks, including the widely-used ResNet and DenseNet systems, in visual recognition tasks.
"AOGNets have better prediction accuracy than any of the networks we've compared it to," says Tianfu Wu, an assistant professor of electrical and computer engineering at NC State and corresponding author of a paper on the work. "AOGNets are also more interpretable, meaning users can see how the system reaches its conclusions."
The new framework uses a compositional grammar approach to system architecture that draws on best practices from previous network systems to more effectively extract useful information from raw data.
"We found that hierarchical and compositional grammar gave us a simple, elegant way to unify the approaches taken by previous system architectures, and to our best knowledge, it is the first work that makes use of grammar for network generation," Wu says.
To test their new framework, the researchers developed AOGNets and tested them against three image classification benchmarks: CIFAR-10, CIFAR-100 and ImageNet-1K.
"AOGNets obtained significantly better performance than all of the state-of-the-art networks under fair comparisons, including ResNets, DenseNets, ResNeXts and DualPathNets," Wu says. "AOGNets also obtained the best model interpretability score using the network dissection metric in ImageNet. AOGNets further show great potential in adversarial defense and platform-agnostic deployment (mobile vs cloud)."
The researchers also tested the performance of AOGNets in object detection and instance semantic segmentation, on the Microsoft COCO benchmark, using the vanilla Mask R-CNN system.
"AOGNets obtained better results than the ResNet and ResNeXt backbones with smaller model sizes and similar or slightly better inference time," Wu says. "The results show the effectiveness of AOGNets learning better features in object detection and segmentation tasks.
These tests are relevant because image classification is one of the core basic tasks in visual recognition, and ImageNet is the standard large-scale classification benchmark. Similarly, object detection and segmentation are two core high-level vision tasks, and MS-COCO is one of the most widely used benchmarks.
"To evaluate new network architectures for deep learning in visual recognition, they are the golden testbeds," Wu says. "AOGNets are developed under a principled grammar framework and obtain significant improvement in both ImageNet and MS-COCO, thus showing potentially broad and deep impacts for representation learning in numerous practical applications.
"We're excited about the grammar-guided AOGNet framework, and are exploring its performance in other deep learning applications, such as deep natural language understanding, deep generative learning and deep reinforcement learning," Wu says.
The paper, "AOGNets: Compositional Grammatical Architectures for Deep Learning," will be presented at the IEEE Computer Vision and Pattern Recognition Conference, being held June 16-20 in Long Beach, Calif. First author of the paper is Xilai Li, a Ph.D. student at NC State. The paper was co-authored by Xi Song, an independent researcher.