CNN model
Title of paper
Student’s name
Course name and number
Instructor’s name
Date submitted
Introduction
The
COVID-19 outbreak has seen a shift in the way scientific
intelligence is utilized in the solution of current and emerging challenges.
Preventative methods like face masks have been essential in preventing more
droplets from passing through the mask or escaping (Hassen & Adane, 2021). Most
governments around the globe have made mask use mandatory in certain
situations, either via national law or, like in the United States, on a
state-by-state basis. Health officials think that wearing masks may help
prevent the spread of COVID-19, especially for individuals who come into close
contact with sick patients. This is based on multiple research (Bennett, 2021).
There are a variety of variables that influence how much protection is
provided, including the kind of mask used, the quantity of virus present, and
the environment. A larger number of persons complying with these mask
regulations will result in a bigger collective health benefit if masks are
beneficial. To put it another way, the pandemic's influence on a country's
social and economic well-being is closely related to how successfully public
health responds. The non-compliance rate with mask rules is higher in other
nations, such as the United States. Bennett (2021) observes that, despite mask
regulations in numerous states and recommendations from the Centers for Disease
Control and Prevention (CDC), several people opt not to use face coverings as a
way to protect themselves and others against COVID-19. Facial mask adherence
may be improved by technology innovation, particularly Machine Learning
techniques, which are significant in face localization utilizing convolutional
neural networks and can be used to increase compliance (Klyuzhin et al., 2020).
The main objective of this study is
to build a model that can distinguish between those who are wearing masks,
those who are not, and those who are wearing masks incorrectly. This project
seeks to develop a highly accurate and real-time system that can effectively
recognize non-mask faces, faces with masks, and those who are not wearing them
correctly in public, and therefore, enforce them to adhere to the correct
health rules on face masks. Face detection and identification algorithms have
been developed by many academics, however, there is a fundamental distinction
between detecting individuals who are wearing masks, those who are not wearing
masks, and people who are wearing masks incorrectly (Kumar, 2016). Few studies have tried to identify
persons wearing masks, according to the existing literature. The goal of this
research is to create methods for detecting people wearing masks over their
faces in public places like theaters and shopping malls. As a result, it is
difficult to distinguish between a person's face with or without a mask in
public since the dataset for identifying masks on human faces is rather tiny.
All kinds of face photos are included in the collection, including those with
and without masks.
Literature review
Principal Component Analysis (PCA) is
a method that can be used to quickly identify or verify a person. This is
according to a study by Ejaz, M. S, Islam, M R, Sifatullah & Sarker,
A. (2019). PCA can be used to quickly identify or verify a person, but it can
be time-consuming. Their studies confirm that facial recognition is a complex
process due to the prevalence of many occlusions or masking such as eyeglasses,
veils, scarves, and other forms of make-up or disguising materials. Such masks
affect the achievement of face detection. Non-masked facial recognition
algorithms developed lately, according to Ejaz et al. (2019), are frequently
utilized and provide improved performance. However, little progress has been
made in the area of masked face recognition. Consequently, in this study, an
analytical strategy that may be used for both non-masked, improper masking, and
masked face recognition has been adopted. One of the most effective and
extensively used statistical methods is PCA (Ejaz et al., 2019). As a result,
the PCA method was selected for use in this study. Comparative studies were
also carried out to have a better knowledge of this topic.
Venkateswarlu et al. (2020) came up
with a pre-trained MobileNet which integrated a global pooling block for
detecting face masks. The pre-prepared MobileNet builds a multi-dimensional
component map by taking a shading image. Using the global pooling block in the
suggested model, we can transform the element mapping into a 64-highlight
vector. Last but not least, the softmax layer utilizes the 64 highlights in
paired order to complete the process. They tested their hypothesis using two
publicly available datasets. Using our suggested model, DS1 and DS2 have both
achieved 99.9% and 100% accuracy, respectively. The suggested model minimizes
overfitting by using a global pooling block. It also surpasses the previous
models based on boundary quantity and time to prepare. This model, on the other
hand, is unable to recognize many face masks at the same time.
Convolutional neural networks (CNNs)
have had a significant impact on computer vision, according to a study by Huang
et al. (2020). To handle huge calculations, the bulk of current CNNs depend
primarily on costly GPU (graphics processing units). As a result, CNNs have yet
to be extensively used in the industrial industry for inspecting surface flaws.
The CNN-based model developed in this research delivers great performance on
microscopic defect detection while running on low-frequency CPUs (Central
Processing Units), which is the goal of this work (Huang et al., 2020). A
decoder and a lightweight (LW) bottleneck make up the Huang et al., 2020)
model. In their experiment, the researchers discovered that CNNs may be small
and hardware-friendly, making them suitable for potential developments in
automated focusing on identifying (Kumar, 2016).
One of the most promising approaches
to human face recognition has been developed by Lawrence et al. (1997),
who used a mixed neural network. A SOM and convolutional neural networks are
used in conjunction with local picture sampling in this system. Convolutional
neural networks give partial invariance to translation, rotation, scaling, and
deformation, whereas SOMs provide dimensionality reduction and invariance to
slight changes in the image sample, hence making it easier to train and test
convolutional neural networks. In a series of layers, the convolutional network
retrieves ever bigger features (Klyuzhin et al., 2020).
Summary of the reviewed literature
In summary, it is clear from the
examined research that face recognition systems are capable of detecting partly
obstructed faces. The degree of occlusion in four areas — the nose, mouth,
chin, and eye – is used to distinguish between annotated masks and hand-covered
faces. Because of this, the model will only be considered "with mask"
when wearing a full-face mask that covers everything from the nose to the chin.
An improved face mask detection method is made possible thanks to the
identification of those who have violated COVID regulations. It is possible to
employ the face mask detection system to ensure our safety and that of others
if it is implemented correctly (Nieto-Rodríguez et al., 2015). This method not only aids in the
achievement of high accuracy but also significantly speeds up the face
detection pace. There are a variety of locations where the system might be
used, including subway stations and markets; schools; and even train stations
and airports (Tan, 2007). As
a final benefit, this study may be referenced by other scientists in the
future. Furthermore, this model is compatible with any HD camcorder, ensuring
that it isn't limited to face mask detection. In addition, a mask may be worn
to do biometric scans.
The system has reached a respectable
level of accuracy by relying on simple machine learning tools and
methodologies. A wide range of uses is possible. In light of the Covid-19
situation, it's possible that wearing a mask may become mandatory in the near
future. To use the services of several public service providers, clients are
required to put on the proper face masks (Kaur et al., 2022). Using the model in place will have a
significant impact on the public health system. Detecting whether or not a
person is wearing the mask correctly might be added to this system in the
future. To determine whether or not the mask is susceptible to viruses, the
model may be further refined to determine whether a surgical, N95-type mask is
being worn.
Solution Reporting
Planned research,
methodology, and evaluation methods
The planned
research involved the development of a Convolutional
Neural Network (CNN) model utilizing TensorFlow with the Keras library and
OpenCV to recognize persons who are wearing masks, Improper wearing of masks
and those are not wearing masks. The Face Mask Detection | Kaggle dataset is being used to
construct this model. Each picture in this dataset is classified as either
with, without, or wrongly wearing a facemask; 853 photos are included in this
collection. TensorFlow will be used to develop a CNN model that can tell
whether someone is wearing face masks by looking at these photos. There are two files containing the
pictures: a "training dataset" and a "test dataset,"
where each of which contains 80% and 20% of the total number of images,
correspondingly. There is a multitude of ways to build bounding boxes, which
are often known called "data annotations," around a specific selected
area. Images will be labeled as "masked," "Without a mask,"
and "improper mask" in the proposed model, with the LabelImg tool
being used to do this. Pre-processing and segmentation techniques are used to
enhance the image's focus on the foreground objects.
The Implementation of this
model will include executing the trials on an Intel Core i7 CPU through an
Nvidia GTX 1,080 graphics card as well as Windows 10. The system utilizes
Python 3.5 as its programming language. To handle and analyze embedded images,
it relies on the PyTorch module in Python 3.5, as well as MATLAB 2019. The
pre-trained model may be used with only 224 x 3 STIF frames.
Once the model has been trained, it
may then be used to predict if a person has worn a mask appropriately. Using
Google Colab, an online GPU environment, the system is built to differentiate
between persons who are wearing masks, those who are not wearing masks, and
those who are wearing masks inappropriately. For training reasons, a folder
called "the trained folder" is employed. A test folder is created for
this model, and it is put to the test to see whether it can distinguish between
masks and no-masks in the original photographs. As time goes on, the scaling
factor decreases by a factor of 0.9 every 10 iterations. The Adam optimizer
uses a momentum value of 0.999. The training technique is repeated until 100
epochs have elapsed.
A reliable face detection model is
necessary after training the classifier for the model to distinguish between
those who are wearing masks and those who are not, as well as between those who
are wearing them inappropriately (Li et al., 2020). The goal of this work is to improve mask
detection accuracy while using fewer resources. CNN, an OpenCV component that
includes an object recognition model called "Single Shot Multibox
Detector," is utilized to do this job. The SSD model relies on ResNet-10
as the architecture's backbone. Even embedded devices like the Raspberry Pi may
benefit from this kind of facial detection.
Activities undertaken
(e.g. Any implementation &/or design of experiments)
The activities undertaken started
with Data Visualization. This is a first step to see how many photographs there
are in each of the three categories in the database overall. The next step is
Data Augmentation, which involves rotating and flipping each of the photos in
the dataset to create a more realistic image. This data was then separated into
two sets: a training set of photos for the CNN model and a test set of images
for the model. This means that 80 percent of the total photos will be used for
training, while the remainder 20% will be used for testing purposes. As
previously noted, the required proportion of photos was divided across the
training and test sets after splitting. The Sequential CNN model was then
developed using a variety of layers, including Conv2, MaxPooling2, Flatten,
Dropout, and Dense. One last Dense layer used the "SoftMax" algorithm
to produce a vector representing the chance of each of the two classes
occurring. Our loss function is "binary cross-entropy," since there
are only two types of classes in our problem. To improve accuracy, the
MobileNetV2 was employed (Dwivedi
& Gupta, 2020).
In the following stage, we developed
a 'train generator' and a 'validation generator' to match the model. The CNN
model was then taught using a training set that was built next. The Sequential
model constructed using the Keras library is used in this stage to suit our
training dataset of photos. Over 50 epochs of training were performed on the
model to improve accuracy and avoid over-fitting. Three probability were
labeled for the outcomes after developing the model. There are three possible
masking states: "without a mask," "with mask," and
"improper mask." Importing the Face Detection Program was the next
step. The facial traits were detected using the Haar Feature-based Cascade
Classifiers.
Images with or without masks are
entered into the model. Using a face detector module that is pre-installed, an
image or frame of a video stream is initially supplied to identify human faces.
Scaling of the picture or video frame is initially done before the detection of
the blob is carried out. Once the face detector model receives this data, it
produces a cropped image of the subject's face solely, excluding the backdrop.
When we submit this face as an input to the model that was previously trained
(Tan, 2007).
Faces from people are used to train
another model. As part of the model's training, photos are delivered that
includes the model's name and email address as the labels. Open CV is used to
do this. As soon as the CV model receives a picture of a face, it invites the
user to enter that person's name and email address, which will be saved in the
database. An input to this model is provided as an output from the first one,
which is called the first model. There will be a comparison between this face
and all others in the database (Tan, 2007).
OpenCV created this cascade classifier to classify hundreds
of photos to identify the frontal face. For this, the face was detected by
downloading an a.xml file and using it to identify the face. OpenCV was used to perform an endless
loop using the webcam in which faces were recognized using Cascade Classifier,
and this was done using the OpenCV library In computer programming, webcam
equals cv2. Using a webcam is denoted by VideoCapture(0). Each of the three
classes ([without a mask, with mask, and inappropriate mask]) will be predicted
by the model. The label will be shown around the faces based on which
likelihood is greater.
The findings of the
work
The results are more in line with the
model's predictions. The camera is used as a medium for mask identification,
and the findings are accurate. Face detection is done by placing a green or red
frame around an individual's face when it is captured by the camera. Wearing a
mask will result in a red frame over one's face, whereas others who are not
wearing masks will not have this effect. The outcome is also shown in the
result frame's upper left corner as a written result. On top of the result
window, you'll notice a % match. Even if the camera sees the side of the face,
the model still functions. It's also capable of detecting many faces in a
single frame of video footage.
When it came to mask wearers, the
model was able to distinguish between those who were appropriately covering
their faces and those who were not. Datasets are used to train, validate, and
test the model. The approach has a 95 percent accuracy rate based on data
analysis. MaxPooling is a major factor in obtaining this level of precision.
Translation invariance is added to the internal representation and the number
of parameters the model must learn is reduced. By reducing the dimensionality
of the input representation, this sample-based discretization process
down-samples. The system is capable of detecting faces that are partially
obscured by a mask, hair, or even a hand. The degree of occlusion in four areas
— the nose, mouth, chin, and eye – is used to distinguish between annotated
masks and hand-covered faces. Because of this, the model will only be
considered "with mask" when wearing a full-face mask that covers
everything from the nose to the chin.
Diverse perspectives and a lack of
precision are the method's biggest obstacles. It's more challenging when there
are moving faces in the video feed. To make a more informed choice between
"with mask" and "without mask," it is helpful to track the
motion of numerous frames in the movie. As part of the model's training, photos are delivered that
includes the model's name and email address as the labels. Open CV is used to
do this. The CV model recognizes a person's face in a picture and asks the user
for their name and email address, which are kept in the database. An input to
this model is provided as an output from the first one, which is called the
first model. There will be a comparison between this face and all others in the
database. The message and email will be sent to him to let him know that he
isn't concealing his identity behind a mask if his face matches the one in the
database. If the individual is not wearing a mask,” Without Mask"
will appear below the bounding box instead of "Mask." An outline painted around the
person's head indicates whether or not they're wearing a mask. A person's name
may be retrieved from a database even if their face isn't hidden. A person's
identity is concealed behind a mask. It's easy to identify the person wearing a
mask if you draw a bounding box around their face. As soon as a user correctly
recognizes a person who isn't wearing a mask, the system searches the database
to see if there are any matches. If the individual is wearing a mask, a
bounding box is created over their face. An outline painted around the person's
head indicates whether or not they're wearing a mask. As long as the database
has a record of a particular individual's face, an email will be sent to that
person notifying them that they are not wearing a mask so they may take
measures. The person's face is shown in a box, which indicates whether or not
they are wearing a mask. An SMS will be sent to the individual whose face is
not covered by a face mask if their face has been registered in the database,
alerting them of the dangers of not wearing a mask.
Conclusions and
additional research
In this research, a CNN system was
successfully constructed to determine if a person was wearing a mask, was not
wearing one, or was wearing one inappropriately. As COVID instances rise
worldwide, the necessity for a technology to replace humans in the process of
checking people's masks has never been greater. That requirement is met by this
system. Public venues like train stations and malls may benefit from this
technology. It will be especially useful in large organizations with a high
concentration of employees. In that case, this system will be very beneficial
since it makes it simple to gather and retain information on the company's
workers, making it simple to identify those without a mask and send an email
alerting them to the dangers of not donning one. This has a wide range of uses.
The COVID-19 situation may necessitate wearing a face mask in the coming years,
and this way of determining if a person is wearing a mask could be useful.
Additional
research should focus on how coughing and sneezing detection can be
implemented as part of the COVID-19 detection methodology. In addition to identifying the mask,
it will calculate the distances between each participant and look for any
chance of coughing or sneezing. An 'improper mask' label may be applied to
images if the mask is not worn correctly. Adaptive models and better optimizers
might also be proposed by academics, as well as tweaks to parameter setup.
Updating and installing the mask
recognition system in retail stores will be part of the ongoing effort, and the
results will be visible on digital and promotional displays. Persons who aren't
wearing a mask may be identified with this model using any existing USB, IP, or
Surveillance cameras. The real-time video mask detection tool can be integrated
into online and desktop applications, allowing the operator to determine if
people are wearing masks and thereby obviating the need for alerts. Images
should be sent to software operators if somebody isn't hiding behind a mask.
Researchers can also install an alarm system that will sound a buzzer if
someone accesses the area without a mask, just in case. This program, which may
be connected to the entry gates, allows only persons who wear face masks to
access. Schools, shopping malls, and many other public places might benefit
from this approach.
References
Bennett, C. (2021). Refusal to wear a mask says
more about you than your face ever could | Catherine Bennett. The Guardian.
Retrieved March 22, 2022, from https://www.theguardian.com/commentisfree/2021/dec/05/refusal-to-wear-a-mask-says-more-about-you-than-your-face-ever-could.
Dwivedi, S.,
& Gupta, N. (2020). A new hybrid approach on Face Detection And
Recognition. https://doi.org/10.31219/osf.io/r7984
Ejaz, M. S., Islam, M. R., Sifatullah, M., &
Sarker, A. (2019). Implementation of principal component analysis on masked and
non-masked face recognition. 2019 1st International Conference on Advances
in Science, Engineering and Robotics Technology (ICASERT). https://doi.org/10.1109/icasert.2019.8934543.
Hassen, S.,
& Adane, M. (2021). Facemask-wearing behavior to prevent COVID-19 and
associated factors among public and private bank workers in Ethiopia. PLOS
ONE, 16(12). https://doi.org/10.1371/journal.pone.0259659
Huang, Y., Qiu, C., Wang, X., Wang, S., & Yuan, K.
(2020). A compact convolutional neural network for surface defect inspection. Sensors,
20(7), 1974. https://doi.org/10.3390/s20071974.
Kumar, P. (2016). Approach on
face recognition & detection techniques. International Journal Of
Engineering And Computer Science. https://doi.org/10.18535/ijecs/v5i7.03
Klyuzhin, I. S., Xu, Y., Ortiz, A., Ferres, J. L.,
Hamarneh, G., & Rahmim, A. (2020). Testing the ability of convolutional
neural networks to learn radiomic features.
https://doi.org/10.1101/2020.09.19.20198077
Lawrence, S., Giles, C. L., Ah Chung Tsoi, & Back,
A. D. (1997). Face recognition: A convolutional neural-network approach. IEEE
Transactions on Neural Networks, 8(1), 98–113. https://doi.org/10.1109/72.554195.
Tan, T.
(2007). From canonical face to synthesis - an illumination invariant face
recognition approach. Face Recognition. https://doi.org/10.5772/4854.
Venkateswarlu, I. B., Kakarla, J., & Prakash, S.
(2020). Face mask detection using the mobile net and global pooling block. 2020
IEEE 4th Conference on Information & Communication Technology (CICT).
https://doi.org/10.1109/cict51604.2020.9312083.
Comments
Post a Comment