CNN model

 

 

Title of paper

Student’s name

Course name and number

Instructor’s name

Date submitted

 

 

 

 

 

 

 

 

Introduction

The  COVID-19 outbreak has seen a shift in the way scientific intelligence is utilized in the solution of current and emerging challenges. Preventative methods like face masks have been essential in preventing more droplets from passing through the mask or escaping (Hassen & Adane, 2021). Most governments around the globe have made mask use mandatory in certain situations, either via national law or, like in the United States, on a state-by-state basis. Health officials think that wearing masks may help prevent the spread of COVID-19, especially for individuals who come into close contact with sick patients. This is based on multiple research (Bennett, 2021). There are a variety of variables that influence how much protection is provided, including the kind of mask used, the quantity of virus present, and the environment. A larger number of persons complying with these mask regulations will result in a bigger collective health benefit if masks are beneficial. To put it another way, the pandemic's influence on a country's social and economic well-being is closely related to how successfully public health responds. The non-compliance rate with mask rules is higher in other nations, such as the United States. Bennett (2021) observes that, despite mask regulations in numerous states and recommendations from the Centers for Disease Control and Prevention (CDC), several people opt not to use face coverings as a way to protect themselves and others against COVID-19. Facial mask adherence may be improved by technology innovation, particularly Machine Learning techniques, which are significant in face localization utilizing convolutional neural networks and can be used to increase compliance (Klyuzhin et al., 2020).

The main objective of this study is to build a model that can distinguish between those who are wearing masks, those who are not, and those who are wearing masks incorrectly. This project seeks to develop a highly accurate and real-time system that can effectively recognize non-mask faces, faces with masks, and those who are not wearing them correctly in public, and therefore, enforce them to adhere to the correct health rules on face masks. Face detection and identification algorithms have been developed by many academics, however, there is a fundamental distinction between detecting individuals who are wearing masks, those who are not wearing masks, and people who are wearing masks incorrectly (Kumar, 2016). Few studies have tried to identify persons wearing masks, according to the existing literature. The goal of this research is to create methods for detecting people wearing masks over their faces in public places like theaters and shopping malls. As a result, it is difficult to distinguish between a person's face with or without a mask in public since the dataset for identifying masks on human faces is rather tiny. All kinds of face photos are included in the collection, including those with and without masks.

Literature review

Principal Component Analysis (PCA) is a method that can be used to quickly identify or verify a person. This is according to a study by Ejaz, M. S, Islam, M R, Sifatullah & Sarker, A. (2019). PCA can be used to quickly identify or verify a person, but it can be time-consuming. Their studies confirm that facial recognition is a complex process due to the prevalence of many occlusions or masking such as eyeglasses, veils, scarves, and other forms of make-up or disguising materials. Such masks affect the achievement of face detection. Non-masked facial recognition algorithms developed lately, according to Ejaz et al. (2019), are frequently utilized and provide improved performance. However, little progress has been made in the area of masked face recognition. Consequently, in this study, an analytical strategy that may be used for both non-masked, improper masking, and masked face recognition has been adopted. One of the most effective and extensively used statistical methods is PCA (Ejaz et al., 2019). As a result, the PCA method was selected for use in this study. Comparative studies were also carried out to have a better knowledge of this topic.

Venkateswarlu et al. (2020) came up with a pre-trained MobileNet which integrated a global pooling block for detecting face masks. The pre-prepared MobileNet builds a multi-dimensional component map by taking a shading image. Using the global pooling block in the suggested model, we can transform the element mapping into a 64-highlight vector. Last but not least, the softmax layer utilizes the 64 highlights in paired order to complete the process. They tested their hypothesis using two publicly available datasets. Using our suggested model, DS1 and DS2 have both achieved 99.9% and 100% accuracy, respectively. The suggested model minimizes overfitting by using a global pooling block. It also surpasses the previous models based on boundary quantity and time to prepare. This model, on the other hand, is unable to recognize many face masks at the same time.

Convolutional neural networks (CNNs) have had a significant impact on computer vision, according to a study by Huang et al. (2020). To handle huge calculations, the bulk of current CNNs depend primarily on costly GPU (graphics processing units). As a result, CNNs have yet to be extensively used in the industrial industry for inspecting surface flaws. The CNN-based model developed in this research delivers great performance on microscopic defect detection while running on low-frequency CPUs (Central Processing Units), which is the goal of this work (Huang et al., 2020). A decoder and a lightweight (LW) bottleneck make up the Huang et al., 2020) model. In their experiment, the researchers discovered that CNNs may be small and hardware-friendly, making them suitable for potential developments in automated focusing on identifying (Kumar, 2016).

One of the most promising approaches to human face recognition has been developed by Lawrence et al. (1997), who used a mixed neural network. A SOM and convolutional neural networks are used in conjunction with local picture sampling in this system. Convolutional neural networks give partial invariance to translation, rotation, scaling, and deformation, whereas SOMs provide dimensionality reduction and invariance to slight changes in the image sample, hence making it easier to train and test convolutional neural networks. In a series of layers, the convolutional network retrieves ever bigger features (Klyuzhin et al., 2020).

Summary of the reviewed literature

In summary, it is clear from the examined research that face recognition systems are capable of detecting partly obstructed faces. The degree of occlusion in four areas — the nose, mouth, chin, and eye – is used to distinguish between annotated masks and hand-covered faces. Because of this, the model will only be considered "with mask" when wearing a full-face mask that covers everything from the nose to the chin. An improved face mask detection method is made possible thanks to the identification of those who have violated COVID regulations. It is possible to employ the face mask detection system to ensure our safety and that of others if it is implemented correctly (Nieto-Rodríguez et al., 2015). This method not only aids in the achievement of high accuracy but also significantly speeds up the face detection pace. There are a variety of locations where the system might be used, including subway stations and markets; schools; and even train stations and airports (Tan, 2007). As a final benefit, this study may be referenced by other scientists in the future. Furthermore, this model is compatible with any HD camcorder, ensuring that it isn't limited to face mask detection. In addition, a mask may be worn to do biometric scans.

The system has reached a respectable level of accuracy by relying on simple machine learning tools and methodologies. A wide range of uses is possible. In light of the Covid-19 situation, it's possible that wearing a mask may become mandatory in the near future. To use the services of several public service providers, clients are required to put on the proper face masks (Kaur et al., 2022). Using the model in place will have a significant impact on the public health system. Detecting whether or not a person is wearing the mask correctly might be added to this system in the future. To determine whether or not the mask is susceptible to viruses, the model may be further refined to determine whether a surgical, N95-type mask is being worn.

Solution Reporting

Planned research, methodology, and evaluation methods

The planned research involved the development of a Convolutional Neural Network (CNN) model utilizing TensorFlow with the Keras library and OpenCV to recognize persons who are wearing masks, Improper wearing of masks and those are not wearing masks. The Face Mask Detection | Kaggle dataset is being used to construct this model. Each picture in this dataset is classified as either with, without, or wrongly wearing a facemask; 853 photos are included in this collection. TensorFlow will be used to develop a CNN model that can tell whether someone is wearing face masks by looking at these photos. There are two files containing the pictures: a "training dataset" and a "test dataset," where each of which contains 80% and 20% of the total number of images, correspondingly. There is a multitude of ways to build bounding boxes, which are often known called "data annotations," around a specific selected area. Images will be labeled as "masked," "Without a mask," and "improper mask" in the proposed model, with the LabelImg tool being used to do this. Pre-processing and segmentation techniques are used to enhance the image's focus on the foreground objects.

The Implementation of this model will include executing the trials on an Intel Core i7 CPU through an Nvidia GTX 1,080 graphics card as well as Windows 10. The system utilizes Python 3.5 as its programming language. To handle and analyze embedded images, it relies on the PyTorch module in Python 3.5, as well as MATLAB 2019. The pre-trained model may be used with only 224 x 3 STIF frames.

Once the model has been trained, it may then be used to predict if a person has worn a mask appropriately. Using Google Colab, an online GPU environment, the system is built to differentiate between persons who are wearing masks, those who are not wearing masks, and those who are wearing masks inappropriately. For training reasons, a folder called "the trained folder" is employed. A test folder is created for this model, and it is put to the test to see whether it can distinguish between masks and no-masks in the original photographs. As time goes on, the scaling factor decreases by a factor of 0.9 every 10 iterations. The Adam optimizer uses a momentum value of 0.999. The training technique is repeated until 100 epochs have elapsed.

A reliable face detection model is necessary after training the classifier for the model to distinguish between those who are wearing masks and those who are not, as well as between those who are wearing them inappropriately (Li et al., 2020). The goal of this work is to improve mask detection accuracy while using fewer resources. CNN, an OpenCV component that includes an object recognition model called "Single Shot Multibox Detector," is utilized to do this job. The SSD model relies on ResNet-10 as the architecture's backbone. Even embedded devices like the Raspberry Pi may benefit from this kind of facial detection.

Activities undertaken (e.g. Any implementation &/or design of experiments)

The activities undertaken started with Data Visualization. This is a first step to see how many photographs there are in each of the three categories in the database overall. The next step is Data Augmentation, which involves rotating and flipping each of the photos in the dataset to create a more realistic image. This data was then separated into two sets: a training set of photos for the CNN model and a test set of images for the model. This means that 80 percent of the total photos will be used for training, while the remainder 20% will be used for testing purposes. As previously noted, the required proportion of photos was divided across the training and test sets after splitting. The Sequential CNN model was then developed using a variety of layers, including Conv2, MaxPooling2, Flatten, Dropout, and Dense. One last Dense layer used the "SoftMax" algorithm to produce a vector representing the chance of each of the two classes occurring. Our loss function is "binary cross-entropy," since there are only two types of classes in our problem. To improve accuracy, the MobileNetV2 was employed (Dwivedi & Gupta, 2020).

In the following stage, we developed a 'train generator' and a 'validation generator' to match the model. The CNN model was then taught using a training set that was built next. The Sequential model constructed using the Keras library is used in this stage to suit our training dataset of photos. Over 50 epochs of training were performed on the model to improve accuracy and avoid over-fitting. Three probability were labeled for the outcomes after developing the model. There are three possible masking states: "without a mask," "with mask," and "improper mask." Importing the Face Detection Program was the next step. The facial traits were detected using the Haar Feature-based Cascade Classifiers.

Images with or without masks are entered into the model. Using a face detector module that is pre-installed, an image or frame of a video stream is initially supplied to identify human faces. Scaling of the picture or video frame is initially done before the detection of the blob is carried out. Once the face detector model receives this data, it produces a cropped image of the subject's face solely, excluding the backdrop. When we submit this face as an input to the model that was previously trained (Tan, 2007).

Faces from people are used to train another model. As part of the model's training, photos are delivered that includes the model's name and email address as the labels. Open CV is used to do this. As soon as the CV model receives a picture of a face, it invites the user to enter that person's name and email address, which will be saved in the database. An input to this model is provided as an output from the first one, which is called the first model. There will be a comparison between this face and all others in the database (Tan, 2007). 

OpenCV created this cascade classifier to classify hundreds of photos to identify the frontal face. For this, the face was detected by downloading an a.xml file and using it to identify the face. OpenCV was used to perform an endless loop using the webcam in which faces were recognized using Cascade Classifier, and this was done using the OpenCV library In computer programming, webcam equals cv2. Using a webcam is denoted by VideoCapture(0). Each of the three classes ([without a mask, with mask, and inappropriate mask]) will be predicted by the model. The label will be shown around the faces based on which likelihood is greater.

The findings of the work

The results are more in line with the model's predictions. The camera is used as a medium for mask identification, and the findings are accurate. Face detection is done by placing a green or red frame around an individual's face when it is captured by the camera. Wearing a mask will result in a red frame over one's face, whereas others who are not wearing masks will not have this effect. The outcome is also shown in the result frame's upper left corner as a written result. On top of the result window, you'll notice a % match. Even if the camera sees the side of the face, the model still functions. It's also capable of detecting many faces in a single frame of video footage.

When it came to mask wearers, the model was able to distinguish between those who were appropriately covering their faces and those who were not. Datasets are used to train, validate, and test the model. The approach has a 95 percent accuracy rate based on data analysis. MaxPooling is a major factor in obtaining this level of precision. Translation invariance is added to the internal representation and the number of parameters the model must learn is reduced. By reducing the dimensionality of the input representation, this sample-based discretization process down-samples. The system is capable of detecting faces that are partially obscured by a mask, hair, or even a hand. The degree of occlusion in four areas — the nose, mouth, chin, and eye – is used to distinguish between annotated masks and hand-covered faces. Because of this, the model will only be considered "with mask" when wearing a full-face mask that covers everything from the nose to the chin.

Diverse perspectives and a lack of precision are the method's biggest obstacles. It's more challenging when there are moving faces in the video feed. To make a more informed choice between "with mask" and "without mask," it is helpful to track the motion of numerous frames in the movie. As part of the model's training, photos are delivered that includes the model's name and email address as the labels. Open CV is used to do this. The CV model recognizes a person's face in a picture and asks the user for their name and email address, which are kept in the database. An input to this model is provided as an output from the first one, which is called the first model. There will be a comparison between this face and all others in the database. The message and email will be sent to him to let him know that he isn't concealing his identity behind a mask if his face matches the one in the database. If the individual is not wearing a mask,” Without Mask" will appear below the bounding box instead of "Mask." An outline painted around the person's head indicates whether or not they're wearing a mask. A person's name may be retrieved from a database even if their face isn't hidden. A person's identity is concealed behind a mask. It's easy to identify the person wearing a mask if you draw a bounding box around their face. As soon as a user correctly recognizes a person who isn't wearing a mask, the system searches the database to see if there are any matches. If the individual is wearing a mask, a bounding box is created over their face. An outline painted around the person's head indicates whether or not they're wearing a mask. As long as the database has a record of a particular individual's face, an email will be sent to that person notifying them that they are not wearing a mask so they may take measures. The person's face is shown in a box, which indicates whether or not they are wearing a mask. An SMS will be sent to the individual whose face is not covered by a face mask if their face has been registered in the database, alerting them of the dangers of not wearing a mask.

Conclusions and additional research

In this research, a CNN system was successfully constructed to determine if a person was wearing a mask, was not wearing one, or was wearing one inappropriately. As COVID instances rise worldwide, the necessity for a technology to replace humans in the process of checking people's masks has never been greater. That requirement is met by this system. Public venues like train stations and malls may benefit from this technology. It will be especially useful in large organizations with a high concentration of employees. In that case, this system will be very beneficial since it makes it simple to gather and retain information on the company's workers, making it simple to identify those without a mask and send an email alerting them to the dangers of not donning one. This has a wide range of uses. The COVID-19 situation may necessitate wearing a face mask in the coming years, and this way of determining if a person is wearing a mask could be useful.

Additional research should focus on how coughing and sneezing detection can be implemented as part of the COVID-19 detection methodology. In addition to identifying the mask, it will calculate the distances between each participant and look for any chance of coughing or sneezing. An 'improper mask' label may be applied to images if the mask is not worn correctly. Adaptive models and better optimizers might also be proposed by academics, as well as tweaks to parameter setup.

Updating and installing the mask recognition system in retail stores will be part of the ongoing effort, and the results will be visible on digital and promotional displays. Persons who aren't wearing a mask may be identified with this model using any existing USB, IP, or Surveillance cameras. The real-time video mask detection tool can be integrated into online and desktop applications, allowing the operator to determine if people are wearing masks and thereby obviating the need for alerts. Images should be sent to software operators if somebody isn't hiding behind a mask. Researchers can also install an alarm system that will sound a buzzer if someone accesses the area without a mask, just in case. This program, which may be connected to the entry gates, allows only persons who wear face masks to access. Schools, shopping malls, and many other public places might benefit from this approach.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

References

Bennett, C. (2021). Refusal to wear a mask says more about you than your face ever could | Catherine Bennett. The Guardian. Retrieved March 22, 2022, from https://www.theguardian.com/commentisfree/2021/dec/05/refusal-to-wear-a-mask-says-more-about-you-than-your-face-ever-could.

Dwivedi, S., & Gupta, N. (2020). A new hybrid approach on Face Detection And Recognition. https://doi.org/10.31219/osf.io/r7984

Ejaz, M. S., Islam, M. R., Sifatullah, M., & Sarker, A. (2019). Implementation of principal component analysis on masked and non-masked face recognition. 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT). https://doi.org/10.1109/icasert.2019.8934543.

Hassen, S., & Adane, M. (2021). Facemask-wearing behavior to prevent COVID-19 and associated factors among public and private bank workers in Ethiopia. PLOS ONE, 16(12). https://doi.org/10.1371/journal.pone.0259659

Huang, Y., Qiu, C., Wang, X., Wang, S., & Yuan, K. (2020). A compact convolutional neural network for surface defect inspection. Sensors, 20(7), 1974. https://doi.org/10.3390/s20071974.

Kumar, P. (2016). Approach on face recognition & detection techniques. International Journal Of Engineering And Computer Science. https://doi.org/10.18535/ijecs/v5i7.03

Klyuzhin, I. S., Xu, Y., Ortiz, A., Ferres, J. L., Hamarneh, G., & Rahmim, A. (2020). Testing the ability of convolutional neural networks to learn radiomic features. https://doi.org/10.1101/2020.09.19.20198077

Lawrence, S., Giles, C. L., Ah Chung Tsoi, & Back, A. D. (1997). Face recognition: A convolutional neural-network approach. IEEE Transactions on Neural Networks, 8(1), 98–113. https://doi.org/10.1109/72.554195.

Tan, T. (2007). From canonical face to synthesis - an illumination invariant face recognition approach. Face Recognition. https://doi.org/10.5772/4854.

Venkateswarlu, I. B., Kakarla, J., & Prakash, S. (2020). Face mask detection using the mobile net and global pooling block. 2020 IEEE 4th Conference on Information & Communication Technology (CICT). https://doi.org/10.1109/cict51604.2020.9312083.

 

 

Comments

Popular posts from this blog

Sound Meter Privacy Policy