JOIV : Int. Inform. Visualization, 9. - March 2025 592-598 INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION journal homepage : w. org/index. php/joiv Maize Leaf Disease Identification with Large and Lightweight Convolutional Neural Models Mohd Haris Lye a,*. Mohammad Faizal Ahmad Fauzi a. Lim Kian Ming b Faculty of Engineering. Multimedia University. Persiaran Multimedia. Cyberjaya. Malaysia Faculty of Information Science and Technology. Multimedia University. Jalan Ayer Keroh Lama. Melaka. Malaysia Corresponding author: *haris. lye@mmu. AbstractAi To minimize yield losses in maize plantations, control measures that include early leaf disease detection are essential. In this study, we evaluated extensive and lightweight convolutional neural network (CNN) models to accurately classify maize diseases from leaf images. To achieve a high image classification performance, existing deep learning approaches often use large models that require substantial computational resources. Simpler and lightweight models provide faster inferences but at the expense of lower accuracy in prediction performance. To improve maize leaf disease classification performance on the lightweight SqueezeNet model, the responsebased knowledge distillation method was evaluated for model training. In response-based knowledge distillation, the logit output from the last layer of the large model is used in the loss function to train the lightweight model. This enables the lightweight model to learn from the knowledge of large and complex models, thereby improving its predictive accuracy while maintaining a simpler architecture and faster inference. A six-class maize disease dataset was prepared using two publicly available datasets. The dataset was used to train and evaluate the selected large and lightweight models. The large and lightweight model demonstrated high classification accuracy when trained till 40 epochs. The trained SqueezeNet model showed promising performance for accurately identifying various maize leaf diseases with an accuracy of 96. When the model is trained with the response-based knowledge distillation method, the test accuracy improves to 97. Such lightweight models with high accuracy can facilitate the deployment on resource-constrained KeywordsAi Knowledge distillation. image classification. lightweight model. plant disease. deep neural network. Manuscript received 10 Oct. revised 17 Dec. accepted 5 Mar. Date of publication 31 Mar. International Journal on Informatics Visualization is licensed under a Creative Commons Attribution-Share Alike 4. 0 International License. papers on deep-learning applications for plant disease Many techniques with good accuracy have been proposed. However, most rely on complex deep-learning In addition, when the model was trained and evaluated on different datasets involving field images, the performance dropped significantly. Several papers have shown the effectiveness of convolutional neural networks in maize disease identification. The Inception CNN model was evaluated on a field-collected maize image dataset and obtained an accuracy of 96% . Data augmentation using brightness change was applied. Recent methods have proposed a lightweight CNN model with attention blocks . , . The smaller CNN model outperforms the mainstream larger models for predicting maize leaf A vision transformer with reduced model complexity is proposed for maize disease identification . , . The paper by . proposes to solve maize disease severity estimation by two-stage segmentation procedure. In the first stage, the leaves were segmented. The identified region of INTRODUCTION Maize is one of the top three most important food crops, along with rice and wheat . It is a significant food source for humans and livestock. However, maize and other crops are easily infected by diseases that reduce their yield and thus threaten food security . A recent study showed that pests and diseases contribute to a mean yield loss of 22. 6% in maize plantations . Current practice requires farmers to manually inspect their crops for signs of disease, which usually manifest through visual patterns on the plant's leaves. This tedious process is not practical for large farms. Many studies have been conducted to perform automated disease identification in maize crops . , . The most effective way is through a deep learning approach that leverages high-performing models for image recognition, such as the convolutional neural network (CNN) and recent vision transformer. Survey paper . reviewed 70 recent 5% to 4. The student model was distilled from various large models namely Resnet50. DenseNet121 and Xception. In the work . , the maize disease identification on the Plant Village dataset is performed with object detection approach. The YOLOv5s model is simplified to reduce its parameter by To enhance its performance, channel-wise knowledge distillation . is used. After training with knowledge distillation from the bigger YOLOv5m model, the mean average precision mAP . improved by 3. 8%, and the parameter size decreased by 15. Similar results on knowledge distillation with object detection methods for plant disease have also been reported . In a related work . Knowledge distillation with data augmentation was applied to the Plant Doc dataset. Literature studies show that there is still limited work that applies knowledge distillation to plant disease detection problems. Most of the papers only use to clean image datasets that are captured in the lab. However, plant diseases should ideally be identified in the field. Such images are challenging to process due to challenging background objects. This study aims to solve the problem by testing a small model in field-collected images. In this paper, we aim to evaluate the performance of the large and the small, lightweight CNN models and assess the impact of using the basic response-based knowledge distillation method . for training a small CNN model. The large models include DenseNet-121 . Resnet-50 . While the two lightweight models MobileNet . and SqueezeNet . are selected for evaluation. Knowledge distillation is used with the SqueezeNet model. DenseNet-121 and ResNet-50 each have 121 and 50 layers. Both models use skip connections to construct deep layers that mitigate the vanishing gradient problem. DenseNet, each layer has skip connections to all other layers in a feed-forward manner. These large and deep models have been used successfully in many computer vision applications and demonstrate impressive performance . , . , . However, these models are not suitable for low-end edge devices because they require large memory space and incur high latency during inference. To address this issue, a number of simple models, such as MobileNet and SqueezeNet, have been proposed. SqueezeNet is a type of convolutional neural network with 18 layers . The model's size is less than 0. 5 MB. This significant complexity reduction is achieved by using the squeeze and expansion layers. The squeeze layer, consisting of 1x1 convolution filters, reduces the number of channels. The expansion layer, using 1x1 and 3x3 convolutional filters, increases the number of channels. The squeeze and expansion layers are grouped in a module. These modules, known as fire modules, are stacked to form the architecture of the SqueezeNet model. MobileNet introduces a layer that uses depthwise separable convolutions to reduce model parameters . This layer uses depthwise convolution followed by . pointwise convolution to replace the standard convolution operation. This strategy has been shown to significantly reduce model parameters and increase computation speed with a minor impact on predictive accuracy . , . A comparison of the size of parameters, model file size, and inference time per image is shown in Table 1. The size of the lightweight model is significantly smaller than that of the large model, such as interest was then segmented to identify the lesion region. Severity was predicted based on the percentage of the leafcovered by the lesion associated with the predicted disease. Three types of maize diseases were considered: the gray leaf spot . , northern leaf blight . and northern leaf spot . The proposed method was evaluated using field images. Applying deep learning models on low-cost edge devices in agricultural settings necessitates reducing model This requirement arises from the resourceconstrained nature of these devices, which are characterized by limited computational capabilities and stringent memory However, a simple model provides lower accuracy due to its limited representation capacity. Consequently, techniques such as knowledge distillation for model compression . have garnered significant attention for improving the predictive accuracy of small models as a way to reduce the computational burden imposed by complex deep learning architectures. Knowledge distillation (KD) is a technique used to transfer knowledge from a larger deep-learning model to a smaller lightweight model . In this scheme, the teacher is the large model, and the student is the smaller model. Knowledge distillation typically involves the following steps. Firstly, the teacher model is trained based on the target dataset to achieve high generalization performance. The trained teacher generates soft targets from its output or intermediate features for teaching the student model. This has been shown to improve student performance because the soft target contains useful information that complements the hard target obtained from the ground truth label. This way, the student is trained to imitate the teacherAos behavior. Knowledge distillation differs from transfer learning, where the weights of a model trained on a large dataset are used to initialize a similar model for related tasks . The model is then fine-tuned on a smaller target dataset. Many studies have shown that this results in performance improvement compared to a randomly initialized model. There are two major approaches to knowledge distillation . , . This depends on information sources leveraged from the pre-trained teacher model for training the student One approach taps into the teacher's output probabilities, also known as the logit-based or response-based method . The other approach relies on the teacher's intermediate representations, often referred to as the hintbased method . Although many new knowledge distillations have been proposed, recent research shows that the standard or vanilla knowledge distillation method still performs well if strong data augmentation and a large dataset is used during training . Knowledge distillation has been applied in various fields of plant disease identification. In . , a small CNN model, namely the tiny mobileNet-v2, which has 25% of the original model size, is used for plant disease recognition on the 39 classes of the Plant Village dataset. The model trained with standard response-based knowledge distillation from a large CNN model showed improved classification performance. The large CNN model EfficientNet and Xception model were used as the teacher. However, this was only demonstrated on a clean lab-prepared dataset. Paper by . with student CNN model composed of just four layers with 3. 71 million parameters, standard KD can improve accuracy ranging from Resnet-50 . , and thus, it is feasible for usage on low-end edge devices such as Raspberry Pi, where the processor speed is low and memory size is restricted. The architectural visualization of the large and small lightweight models used are shown in Figure . Ae . Fig. 1 DenseNet architecture diagram . Output Fig. 4 SqueezeNet architecture diagram . TABLE I NUMBER OF PARAMETERS. FILE SIZE AND MODEL INFERENCE TIME FOR THE CNN MODELS EVALUATED IN THE EXPERIMENT Fig. 2 ResNet50 architecture diagram . Model Number of Parameters . File Size (MB) MobileNet-V2 SqueezeNet1. Resnet-50 DenseNet-121 Model Inference Time . II. MATERIALS AND METHOD Method We use the standard knowledge distillation approach . to improve the small lightweight CNN model. This enables the lightweight model to learn from the actual label and complex model response. In this case the lightweight model is named as the student model while the deep complex model is named as the teacher. The student model is trained with both the student classification loss and distillation loss. The distillation loss function uses Kullback-Leibler (KL) . the metric function is shown in equation . It is used to measure the distribution difference between the soft targets generated from the teacher model output pt and the predicted probabilities of the student model ps. The SoftMax function parameterized by the temperature T is used to obtain class probability estimates for both the teacher . oft target. and student models. Higher temperature leads to smoother class probability distribution and this can aid in transferring the teacherAos knowledge to the student model. Fig. 3 MobileNet architecture diagram . Let zi be the logit output for class i. The class probability pi(T) of the image is given by equation . When all classes are considered, the class predictions form the vectorial input ps and pt used in the Kullback-Leibler metric function shown in equation . In standard knowledge distillation, the student model is trained based on the loss function that is the weighted sum of the KullbackAeLeibler divergence (KL) loss and cross-entropy (CE) loss, as shown in equation . The training scheme is illustrated in Figure 5. Training involves two stages. In the first stage, the teacher is trained on the selected dataset. In the second stage, the teacherAos output from the training images is used as soft target. The hard target is obtained from the ground truth label y and this is jointly used with the soft target for training the student model. L #, & 1( Fig. 6 Shows sample images from the training set for the 6 classes of the CDS Makerere dataset. Each row shows 3 sample images for one class. The classes from the top row are gls, healthy. MLB. MSV, nlb, and nls. Fig. 5 Visualization of the student model training scheme based on the standard response-based knowledge distillation. RESULTS AND DISCUSSION Dataset To evaluate the models, we prepared the maize image dataset by combining the publicly available CDS . and Makerere maize plant disease dataset . to form a new six-class dataset named the CDS Makerere maize disease dataset. The CDS dataset contains images from three classes of disease: Gray Leaf Spot (GLS). Northern Leaf Blight (NLB), and Northern Leaf Spot (NLS). The Makerere dataset has two classes of maize disease: Maize Leaf Blight (MLB) and Maize Streak Virus (MSV), with one healthy class. Since the number of images from the Makerere dataset is much more significant than the CDS dataset, an approximately equal number of images from each class are sampled to match the class distributions in the CDS Five classes of maize plant diseases and one normal class are used in the six classes dataset. The detailed distribution of the classes is presented in Table 2. Sample images in each class are shown in Figure 6. The field-collected images are challenging due to high interclass similarities, low intraclass differences, and cluttered backgrounds. Experiments Setup The experiment aims to evaluate the performance of large and lightweight models in maize disease identification from leaf images. The DenseNet-121 and the Resnet-50 models are selected for the large model. The MobileNetV2 and SqueezeNet1. 1 are the selected versions of the light models. The models are all developed with Pytorch 1. 1 software. The experiment was conducted on an x86 Intel-based desktop computer with 32 GB memory and was equipped with Nvidia GPU RTX3070 8GB RAM. The images from the CDS Makerere dataset are used in this study. The training images are augmented using a transformation pipeline comprising of resizing to 224x224 pixels, random horizontal flipping, and random rotations within A15 degrees. All experiments are executed with the random seed 41 to ensure reproducible All the models are initialized with pre-trained weights from the ImageNet dataset with the transfer learning method. The trained weight is obtained from the Pytorch Torchvision Training involves finetuning all the layers of the The AdamW optimizer is used with a learning rate of 0001, batch size sixty-four, and trained for 40 epochs. TABLE II DISTRIBUTION OF TRAINING AND TEST IMAGES FOR THE 6 CLASSES CDS MAKERERE DATASET Class Name Total #Training Images #Test Images Results The first experiment trains the DenseNet-121. Resnet-50. MobileNet . , and SqueezeNet . based on the CDS Makerere training images. The models are then evaluated on the test images. The classification accuracy for all the evaluated models on the CDS_Makere test set is shown in Table 3. The DenseNet-121 model shows the highest accuracy of 99. It is worth noting that the lightweight model MobileNetV2 manages to attain higher accuracy than the larger Resnet50 model. predicts a positive class or unfavorable class. This is repeated for each class in the test set. For each binary classification, the predicted output is labeled as either true positive, true negative, false positive, or false negative. The total true positive, true negative, false negative, and false positive labels are named TP. TN. FN, and FP, respectively. These scores are used to compute the class-wise Precision. Recall, and F1 values for the model performance metrics, as shown in equations . , . , and . TABLE i CLASSIFICATION ACCURACY FOR VARIOUS CNN MODELS ON THE CDS MAKERERE MAIZE DISEASE DATASET. CNN Model DenseNet-121 Resnet-50 MobileNet . obilenet_v. SqueezeNet . SqueezeNetKD Classification Accuracy (%) ,-. /0/ 1 The name in brackets shows the exact version used. squeezenetkd is a squeeze net trained with standard knowledge distillation. In the second experiment, the best large model. DenseNet121, is used to train the SqueezeNet model with the standard knowledge distillation method . DenseNet-121 is used as the teacher model, and SqueezeNet is used as the student model. The same CNN training parameters are used as in the first experiment. The aim is to evaluate whether knowledge distillation can help increase the accuracy of the SqueezeNet model. In knowledge distillation, logit output from the teacher is utilized as a soft target to train the student model with a custom training loss function indicated in equation . The training loss depends on two parameters, namely temperature T and distillation weight loss, as shown in equation . The paper . recommends the use of T in the range . 5, . achieve a smoother soft target for aiding student model learning, the temperature value of 4 (T=. is used. Knowledge distillation with three values of is compared to evaluate the effect of distillation loss weight. The result is tabulated in Table 4. 2 4 52 2< = 2 Oo 2< = 2 4 58 >? 4 @ =ABB >? Oo @ =ABB Figure 7 displays the confusion matrix of SqueezeNetKD on CDS Makerere test images, showing highly accurate predictions across all classes. Analysis indicates strong Precision. Recall, and F1 score metrics for all classes . ee Table . Figure 8 shows training and testing results for SqueezeNetKD across various epochs, demonstrating convergence to stable training and test loss values. Figure 9 illustrates SqueezeNetKD's test accuracy, which rises rapidly in the first five epochs and stabilizes after 30 epochs. TABLE V REPORT ON PRECISION. RECALL AND F1 SCORE FOR SQUEEZENETKD MODEL TESTED ON THE CDS MAKERERE DATASET Class Precision Recall F1-Score TABLE IV IMPACT OF DISTILLATION LOSS WEIGHT c ON STUDENT MODEL CLASSIFICATION ACCURACY ON CDS MAKERERE DATASET. Alpha Student Accuracy (%) Densenet-121 serves as the teacher model, and SqueezeNet as the student model. Higher distillation loss weights improve classification performance, demonstrating the advantage of stronger model logits in training weaker models during knowledge distillation. Using the optimal loss weight enhances SqueezeNet's accuracy over standard cross-entropy The SqueezeNet model trained with knowledge distillation (=0. and DenseNet-121 will be referred to as the SqueezeNetKD model. To assess the performance of SqueezeNetKD for each class, the model performance is visualized in a confusion The confusion matrix shows the number of predicted labels belonging to true and incorrect labels. This way, the model's confusion concerning its predicted class can be The prediction accuracy for each class can be quantified as well. This is achieved by computing the classwise precision, recall, and F1 score. The multi-class classification is framed as a separate binary classification in this setup. Thus, for each sample in the test set, the model Fig. 7 Confusion matrix for SqueezeNetKD prediction on CDS Makerere dataset test set. Fig. 8 Training and test loss for SqueezeNetKD model when evaluated after various training epochs. The model is evaluated with the CDS Makerere dataset REFERENCES Fig. 9 Test accuracy for SqueezeNetKD model when evaluated after various training epochs. Model is tested on the CDS Makerere dataset IV. CONCLUSION In this paper, we evaluate the application of large and lightweight convolutional neural network models to recognize maize leaf diseases from leaf images accurately. Our experiments demonstrated the promising performance of the deep and large CNN models. DenseNet-121 and ResNet50, in accurately classifying various maize leaf diseases. However, the deployment of these large models on resourcelimited edge devices in agricultural settings is hindered by their substantial computational requirements and memory To address this challenge, we evaluated lightweight models and contrasted the performance with the larger model's. The result shows that small model can attain comparable accuracy to the large model on the proposed maize leaf disease dataset. Knowledge distillation is used to train the SqueezeNet model to improve the performance of the small model. This enables knowledge transfer from the best-performing complex model. DenseNet-121, to the smaller SqueezeNet Specifically, we trained the lightweight SqueezeNet model using the standard response-based knowledge distillation method, with DenseNet-121 as the teacher model. The results revealed that knowledge distillation improves the classification accuracy of the SqueezeNet model, bridging the performance gap with the larger models. The SqueezeNet model, trained with knowledge distillation, achieved an accuracy of 97. 13% on the CDS Makerere dataset, outperforming the standard SqueezeNet model trained solely on ground truth labels. The successful application of knowledge distillation in this study highlights its potential for developing accurate and efficient lightweight models for plant disease identification in the field with resource-constrained edge devices. leveraging the knowledge of large and complex models, lightweight models can achieve comparable performance while maintaining a smaller memory footprint and faster inference times. Although our study focused on maize leaf diseases, the knowledge distillation method can be applied to other plant species and disease categories, fostering the development of practical and affordable solutions for sustainable agriculture. ACKNOWLEDGMENT