A light trap and computer vision system to detect and classify live moths (Lepidoptera) using tracking and deep learning

This paper presents a portable computer vision system, that is able to attract and detect live insects. More specifically, the paper proposes detection and classification of species by recording images of live moths captured by a light trap. A light trap with multiple illuminations and a camera was designed to attract and monitor live insects during twilight and night hours. A computer vision algorithm referred to as Moth Classification and Counting, based on deep learning analysis of the captured images then tracked and counted the number of insects and identified moth species. This paper presents the design and the algorithm that were used to determine and identify the moth species. Observations over 48 nights resulted in the capture of more than 250,000 images with an average of 5,675 images per night. A customized convolutional neural network was trained on 864 labelled images of live moths, which were divided in to eight different species, achieving a high validation F1-score of 0.96. The algorithm measured an average classification and tracking F1-score of 0.83 and a tracking detection rate of 0.79. This result was based on an estimate of 83 individual moths observed during one night with insect activity in 122 minutes collecting 6,000 images. Overall, the proposed computer vision system and algorithm showed promising results in nondestructive and automatic monitoring of moths as well as classification of species. The system provides a cost-effective alternative to traditional methods, which require time-consuming manual identification and typically provides coarse temporal solution to capturing data. In addition, the system avoids depleting moth populations in the monitoring process, which is a problem in traditional traps that kill individual moths. As image libraries grow and become more complete, the images captured by the trapping system can be processed automatically and allow users with limited experience to collect data on insect abundance, biomass, and diversity.


Introduction
There is growing evidence that insect populations are declining in abundance, diversity, and biomass Wagner (2020). Multiple anthropogenic stressors are proposed as drivers of these changes Harvey et al. (2020), but the complexities of the responses and differences in the ecology of the millions of species involved makes it difficult to pinpoint causes of these declines as well as mitigation nendijk and Ellis (2011). Recently however stable populations have also been observed Macgregor et al. (2019). Changes in the abundance of moths could have cascading effects through the food web suggesting that moths are a very relevant group of insects to monitor more effectively in the context of global change.
The most widely applied methods of monitoring moths are to use light traps Jonason et al. (2014). The moths are typically killed and manually counted and classified by human intervention. However, most methods requires experts and tedious manual labor to visit the traps and classify individual species. In this paper, we describe a new method for automatic counting and classification of moth species in order to accelerate the study of moth population ecology and minimize the impact on moth population.

Related work
Attempts to trap and record insects with computer vision or identify moths based on images have already been described in various papers.
A survey of moths using light traps has been done in Jonason et al. (2014), where two different light sources were used to attract different species of moths. The specimens were collected in a box without killing them. The box was then emptied every morning for manual species identification. The survey explored the influence of weather, time of year and light source. It was observed that a high levels of species richness and abundance per night were mainly influenced by a high temperature, low humidity and a lamp type with high power intensity. It was concluded that sampling during the warmest nights in each month (March to October) achieved the best results for understanding the species' phenology. Inspired by the work of Jonason et al. (2014) our work includes the use of light to attract insects and moths during twilight and night hours. Ding and Taylor (2016) have presented a trap for automatic detection of moths, which contained a pheromone lure to attract insects of interest and an adhesive surface, where the insects became stuck. Digital images with a relatively low resolution (640x480 pixels) were taken of the dead and live insects at a fixed time point daily and transmitted to a remote server. The paper showed that it is possible to train a convolutional neural network (CNN) to identify moths. However, it only dealt with a binary classification problem to recognize if an insect in the landfill was a moth or another type of pest insect. Watson et al. (2004) and Mayo and Watson (2007) have provided an analysis of 774 live individuals from 35 different moth species to determine whether if computer vision techniques could be used for automatic species identification. Focusing on data mining for feature extraction and a Support Vector Machine (SVM) for classification, their work achieved a classification accuracy of 85% among 35 classes distributed across 774 images.
While previous analyses of the same dataset would require manual selection of regions on the moths to identify an individual, Mayo and Watson (2007) could effectively classify from a single image. In addition, Batista et al. (2010) have proposed a nearest neighbor algorithm based on features such as texture, color and shape to classify moth species from the same dataset obtaining an accuracy of 79.53%. The work of Watson et al. (2004), Mayo and Watson (2007) and Batista et al. (2010) were published before CNNs became a widespread technology, and their work contributed to the future, as a foundation of solving the problem of species classification, with deep learning.
Several of the issues associated with moth classification when the dataset contains a large number of classes have been illustrated by Chang et al. (2017). Their work presented a dataset of 636 butterflies and moths distributed over 14,270 highly detailed images, which were collected using internet search engines. The challenge with such a large dataset is that the variation between species is minimal while the variation within the individual classes can be large. In addition, the dataset consisted of images with complex backgrounds which makes it difficult to distinguish the individual insects. This makes it necessary to use more complex and larger models such as Visual Geometry Group (VGG) Simonyan and Zisserman (2015), Inception Szegedy et al. (2015) and Residual Networks (ResNet) He et al. (2016) to perform classification reliably. Furthermore, training an effective model with is challenging with rare species, where there is not enough data. In our work, we presents a simpler CNN model, which is able to classify moth species based on images with a controlled illumination, background and camera setup.
Several others Chang et al. (2017); Xia et al. (2018) have proposed deep CNNs to make finegrained insect, butterfly and moth classifications based on images from the Internet. The challenge of using this approach is that butterflies and moths are typically captured with unfolded wings in such images. This is not the natural resting position for live moths on a plane surface and thus not the way they would appear in a light trap such as the one presented in the present paper. Zhong et al. (2018) have also worked on a trap that captures individual insects for later analysis. Their system was implemented on a Raspberry Pi 2 model B with the corresponding Raspberry camera module. An algorithm was proposed for the detection and classification of different orders of flying insects. To detect and count the number of individuals, a first version of the deep neural network "You Only Look Once" (YOLO) Redmon et al. (2016) was used. A SVM was after wards used to perform classification of insect order based on features. Using a combination of YOLO and SVM minimized the amount of training data required. The work concentrated on the classification of insect orders and thus did not make the determination of different families or species. The result was a counting and classification precision of 92.5% and 90.8% respectively. Since the classification was made with SVM based on manually defined features, it is unlikely the method would be easy to expand for moth species classification as selecting useful features would be challenging. Since the trap captures and retains the individuals, counting the individuals would be a relatively trivial task compared to tracking live moving insects. This paper presents a new light trap with a camera and several illuminations to attract and monitor flying and resting insects and moth species automatically without killing them. The computer vision system is based on a Raspberry Pi and a high resolution web camera, which is able to capture detailed images of the individual insects.
In relation to image processing, our paper presents a novelty in the form of an extended imageprocessing pipeline that also considers the temporal image dimension. Compared to the results in Watson et al. (2004); Xia et al. (2018) and Eliopoulos et al. (2018), we present a fully functional imageprocessing pipeline that includes tracking of insects and utilizes a CNN for moth spices classification in the intermediate steps of the pipeline. In the last part of the pipeline, we incorporate a novel image-processing algorithm named Moth Classification and Counting (MCC) which is able to track and count moth species in the temporal time dimension.

Materials and methods
In this section, the portable light trap and computer vision system is described, which was used to detect and monitor live insects and moths. The system was designed to attract moths and insects in the night hours and automatically captured images based on motion. Whenever a change in the camera field of view was detected by the computer vision system a sequence of images of the moth or insect was captured and stored on a hard drive. Any insect above a certain size would be detected by the camera and insect motion would be captured. The construction and selection of components of the system and algorithm to count and identify the species of moth is described in the following sections.

Hardware equipment
The primary components of the light trap vision system was the ultra-violet (UV-light) fluorescent tube from Bioform (accessed on 1/3-2020) (Article No.: A32b), light table from Computermester (accessed on 1/3-2020) and computer vision system as illustrated in figure 1. The purpose of the UV-light was to attract insects to the location of the trap from a long distance. A light table (LED A3 format) placed in front of the computer vision system was covered with a white sheet to ensure a diffuse illumination without reflection in the images. The white uniform background ensured easy detection of the insects and the light table helped to ensure that insects settled on the white sheet. The computer vision system was composed of a light ring from 24hshop (accessed on 1/3-2020), web camera from Logitec (accessed on 1/3-2020) (Brio Ultra HD Pro Webcam) with a resolution of 3840x2160 pixels and a Raspberry Pi 4 computer. The light ring (diameter 153/200 mm) was placed in front of the light table to ensure a diffuse foreground illumination of the insects. The intensity of the light ring could be adjusted from a combination of yellow and blue/white LED light. Camera focus and exposure was manually adjusted to achieve an optimal image quality as seen in figure 2. The field of view was adjusted in order to ensure sufficient resolution of the insects and cover approximately half of the area of the light table. A working distance of 200 mm from camera to light table was chosen. This gave a field of view of 320x170 mm and resulted in a sufficient image quality being able for identifying moth species. The camera and light were adjusted so that the moth species could be identified by an expert based on an enlarged photo of as shown in figure 3.
Sugar water was sprinkled on the white sheet to attract insects. The sugar water caused more insects to stay on the sheet for a longer period of time. However, the trap also worked without sugar water, although less effectively. A motion program running on the Raspberry Pi 4 was installed to capture a sequence of images whenever a movement was detected in the camera view. It was programmed to save images in JPG format whenever it detected change in more than 1500 pixels. This setting ensured that only larger insects where captured and thus filtered smaller insects such as mosquitos and flies from the footage. The maximum frame rate was limited to 0.5-2 fps, resulting in a timeframe  of 0.5-2 seconds between images. On warm summer nights with a high level of insect activity more than 20,000 images were captured per night. To save space on the hard drive, a frame rate of 0.5 fps was selected, which was sufficient for identifying and counting moth species. To save power, the Raspberry Pi was programmed to capture images between 9pm and 5am and to turn off the UVlight during daytime. A circuit with relay was constructed to control the UV-light placed inside the computer and camera box. A solid-state drive (500 GB) was connected to the Raspberry Pi to store the captured images. Inside the junction box, a DC-DC converter was installed to convert the supply voltage from 12V to 5V. The Raspberry Pi and light ring was supplied with 5V and UV-light, and the light table was supplied with 12V.
The whole system was supplied with 12V. The power consumption was 12.5 Watt during daytime when the UV-light was turned off and the computer system was idle. During the night, the system used 30 Watt when the UV-light was turned on and the computer vision system was recording images. With a power consumption of only 12.5 Watt during day hours and 30 Watt during night hours, a 12V battery with a 25Wp solar panel and regulator should be sufficient for powering the system during the entire summer period. Figure 4 shows a photograph of the actual constructed light trap and computer vision system seen from the outside.

Counting and classification of moths
We developed a novel computer vision algorithm refered to as Moth Classification and Counting (MCC), which was able to count the number of insects and identify known species of moths. The algorithm produced statistical data of individual moths and their species, as well as the number of unknown insects detected. For every individual moth detected, the time and identity was recorded. In the following subsections, important parts of the algorithm are explained in more detail. However, we will first present brief overview of the MCC algorithm. The MCC algorithm was composed by a number of sequential steps, where each image in the recording was analyzed as illustrated in figure 5.
The first step was to read an image for a selected observed night sorted by time of capture. This was segmented as black and white, followed by blob detection to mark a bounding box around each detected insect. This was achieved via the blob detection and segmentation described in section 2.2.1.
The position of each insect region in the image was estimated based on the center of the bounding box.
The second step tracked insects in the image sequence as described in section 2.2.2. Tracking was important for recording the movement and behavior of the individual insect during its stay in the light trap and ensuring it only was counted once.
For every insect track, it was evaluated whether the insect was a known moth species. A trained customized CNN, which used a fixed cropped and resized image of each insect, was used to predict moth species. This third step is described in more detail in section 2.2.3.
The final step collected information and derived a summary of counted individuals of known moth species and unknown insects detected and tracked by the algorithm. The summary information were annotated to the image with highlighted marks of each insect tracks.

Blob detection and segmentation
A grey-scaled foreground image was made by subtracting a fixed background image of the white sheet without insects. Several methods were investigated to perform segmentation of a black and white image like using a global threshold or regions of adaptive threshold segmentation. The Otsu (1979) threshold algorithm turned out to be the best choice. A black and white image was made using Otsu threshold on the foreground image, followed by a morphological open and close operations to filter small noisy blobs and closing of blobs. Finally the contour of blobs was found and the bounding box of insect regions was estimated. Estimating the position of insects based on bounding boxes did not work properly in rare cases, when the individuals were too close to each other. A result of segmentation and bounding box estimation is shown in figure 6.

Insect tracking and counting
If the same insect was found in a sequence of images, it should only be counted once. Thus, it was necessary to track the insect in order to count the number of individual insects correctly. The individual insects were relatively stationary during their stay, and images were captured at two second intervals in case of activity. Therefore, it was assumed that a match was more likely for the shortest distance between two bounding boxes. That is, two boxes that were close to each other were likely to be the same individual.  The position and size of each insect were estimated for every single frame, and tracking could therefore be solved by finding the optimal assignment of insects in two consecutive images. The Hungarian Algorithm Munkres (1957) was our chosen method for finding the optimal assignment for a given cost matrix. In this application, the cost matrix should represent how likely it was that an insect in the previous image had moved to a given position in the current image. The cost function was defined as a weighted cost of distance and area of matching bounding boxes in the previous and current image. The Euclidean distance between center position (x, y) in the two images was calculated as follows: This distance was normalized according to the diagonal of the image: The area cost was defined as the cost between the size of bounding boxes: A final cost function in equation 4 was defined with a weighted cost of distance W dist and weighted cost of area W area . (4) The Hungarian Algorithm required the cost matrix to be squared and, in our case, was defined as an N ×N matrix, where each entry was the cost assigning insect i in previous image to insect j in current image. After a match with minimum cost, the entry in the current matrix was assigned a Track ID from the entry in the former. The found Track IDs and entries were stored and used in the upcoming iteration. Dummy rows and columns were added to the matrix to ensure that it was always squared. All entries in the dummy row or column had to have a cost significantly larger than all other entries in the cost matrix to ensure that the algorithm did not make a "wrong" assignment to a dummy. The resource or task being assigned to a dummy could be used to determine which insect from the previous image had left, or which insect had entered into the current image.
To evaluate the performance of the tracking algorithm, two metrics were defined based on the paper Bashir and Porikli (2006). The measure False Alarm Rate (FAR) was an expression of the probability that a given track was incorrect. It describes the number of false alarms relative to the total number of tracks; that is, how many times the tracker made a wrong track compared to the times it made a track.
While a true positive (TP) was defined as an individual who retained its uniquely assigned Track ID in its entire presence of the observation. A false positive (FP) was defined as an individual who was either counted multiple times or assigned a new Track ID.
The term Tracking Detection Rate (TDR) was a measure of the number of individuals who maintained their own Track ID in relation to the established ground truth (GT), during the course of observation. The size was therefore used as the primary scale to express the tracker's ability to maintain the same Track ID for the individual insects in an observation.
GT was defined as the total number of unique individuals in the test set measured by manual counting.

Moth species classification
In the field of deep learning, specific architectures of CNNs have provided particularly positive results in many areas of computer vision Liu et al. (2017). CNNs use both pixel intensity values and spatial information about objects in the image. It was a challenging task to find a suitable CNN architecture for classification of moth species. Several CNN architectures Bjerge et al. (2019); Sandler et al. (2018), were investigated. A customized network was finally designed inspired by Krizhevsky et al. (2017). Hyperparameters of the architecture were explored to find the optimal network architecture to classify moth species. The model was designed to be light and fast for the purpose of being able to be executed on the embedded Raspberry Pi computer used in the light trap.
Since the camera recorded images of insects with a constant working distance, the insects did not change in size in the images. The moths were labeled with bounding boxes with an average size of 473×475×3 pixels and a standard deviation of 100 for pixel height and width. Initial experiments gave poor results with a resized input size of 224×224×3, which many CNNs Huang and Wang (2017) use. Improved results were achieved by reducing the input size, while still being able to visually identify the moth species. Based on the given camera setup the bounding boxes were finally resized approximately seven times to a fixed window size of 64×64×3 as input for the customized CNN model.
Several images with eight different species of moths were used for training the CNN model. The CNN model had four layers for feature detection and two fully connected layers for final species classification. The optimal architecture was found by using combinations of hyperparameters for the first and last layer in the CNN. Below are the parameters used to train different CNN's for species classification: Fixed pool size and stride, n × n, n ∈ {2} Kernel size n × n, n ∈ {3, 5} Convolutional depth n, n ∈ {32, 128} Fully connected size n, n ∈ {256, 512} Optimizer n, n ∈ {Adam, SGD} The optimal chosen CNN architecture is shown in figure 7. The first layer performed convolution using 128 kernels with a kernel size of 3×3 followed by maximum pooling of size 2×2 and stride 2. The second and third layer performed convolution using 64 kernels with the same kernel and pooling size as mentioned above. The final layer used 128 kernels based on the optimization of hyperparameters. All convolutional layers used the rectified linear unit (ReLu) activation function. The last fully connected layer had two hidden layer with 2048 and 256 neurons and a softmax activation function in the output layer. Two of the most commonly used optimizers, Adaptive Moment Estimation (Adam) and Stochastic Gradient Decent (SGD), were investigated. While Adam was an optimizer that converged relatively quickly, it did so at the expense of a greater loss. SGD, on the other hand, converged more slowly, but achieved a smaller loss.
The final architecture was chosen because it achieved average precision, recall and an F 1 -score of 96%, which indicated a suitable model classification. More results regarding the moth classifier are described in section 3.2.

Statistic
Statistics were generated based on the number of counted insects, the number of moth species found, and the number of unknown insects found (i.e. unknown to the trained CNN algorithm). The statistics were updated live as the images were processed and analyzed. Thus, the statistics were always updated through the execution of one iteration of the algorithm see figure 5. The classification phase simultaneously classified each individual and assigned labels to each individual species. That is, individuals in an image could be classified as a different moth species in the previous image. This phase ensured that the moth species most frequently classified in a track were represented in the final statistics. Several parameters were defined and adjusted to filter noisy tracks such as insects flying close to the camera.

Results
An experiment was conducted in Aarhus, Denmark in the period 3 August to 20 September 2019, capturing 272,683 images in total over 48 nights. The constructed light trap was located close to the forest of Risskov, where it was active in the period from 9pm to 5am each night. The collection of image data was limited to a smaller sample of Danish moths of the family Noctuidae. An observation was defined as the images from a single night's footage. On average, one observation consisted of 5,675 images. It should be noted that the number of images captured in the observations varied significantly. The smallest observation consisted of 134 images, while the largest consisted of 27,300. The primary reason for this was the variation in weather conditions. While the activity of moths was limited during rainy weather or strong winds, for instance, it increased in warm late summer evenings and nights. Figure 8 shows an example of many moths in the trap on a warm summer night.
In the following sections, the results from the experiment, and the last stages of the algorithm concerning tracking, classification and statistics (as described in figure 5) will be presented.

Insect tracking and counting
Tracking was tested on an observation consisting of 6000 different images distributed over 122 minutes of insect activity. The observation was collected on the night between 25 August and 26 August 2019 and represents the average observation for a single night. The set was selected in order to provide the tracking algorithm with a sufficient challenge. The observation was challenging because the individual insects moved in and out of the camera view repeatedly. In addition, there were examples of two insects siting close together. Furthermore, some insects flew very close to the camera lens. The algorithm was therefore challenged in its ability to retain the correct tracking. The observation was manually reviewed to establish a ground truth (GT) thereby enabling evaluation of the performance. In total, a GT of 82 individual moths was observed.
The tracker algorithm measured 83 individual tracks, where 65 were identified as true positive (TP) and 18 as false positive (FP) moth tracks. Table 1 shows a calculated Tracker Detection Rate (TDR) of 79% and False Alarm Rate (FAR) of 22% based on the observation of 6,000 images.

Moth species classification
From the experiment mentioned in the beginning of this section, images of eight different species with, frequent-occurring species from different dates were selected, annotated and identified

Metric Equation Result
TDR 6 0.79 FAR 5 0.22 No. Moth Species Numbers 1 Arotis puta 108 2 Amphipyra pyramedia 108 3 Autographa gamma 108 4 Caradrinina 108 5 Mythimna pallens 108 6 Noctua fimbriata 108 7 Noctua pronuba 108 8 Xestia c-nigrum 108 Total 864 to train different CNN models. Examples of individual moth species are shown in figure 9 Table 2 shows an overview of the occurrence of all species in the chosen dataset for training and validation of the CNN algorithm.
The moth class Caradrinina consists of the three moth species Hoplodrine ambigua, Hoplodrina blanda and Hoplodrina octogenaria. These species are very similar in appearance and it was too difficult to distinguish between them from the images alone. Data augmentation was applied to all images with a flip vertical, horizontal, zoom, different illumination intensity and rotation of different degrees. This operation provided more training data and was used to create a uniform distribution of moth species. The dataset was scaled with a factor of 32 times resulting in 27,648 images where each class contained 3,456 data points after augmentation. From this dataset, 80% was used for training and 20% for validation of the CNN model.
To find the best CNN architecture for moth species classification, different hyperparameters were adjusted as described in section 2.2.3. A total of 64 architectures were trained using a dropout probability of 0.3 after the second last hidden layer. The average F 1 -score for all classes was used as a measure for a given architecture's performance. Ta  ble 3 shows a summary of the architectures with the highest validation F 1 -scores. The three best architectures had very high F 1scores, which only varied by 0.00024, but had a varying number of learnable parameters. Compared to SGD, Adam turned out to be the superior optimizer for training of all models as shown in table 3. In the end, the architecture with the highest amount of learnable parameters (714,760) was chosen because it had the highest F 1 score and was still fast to train. However, there was a risk of overfitting with many parameters and few training data. The chosen model shown in figure 7 had an F 1 -score of 95.8%, which indicated that the trained CNN was very accurate in its predictions. The confusion matrix (figure 10) was based upon the validation of the chosen model. The confusion matrix has a diagonal trend, which indicates that the model matched the validation set well. The model had a recall of 96% indicating that only 4% of the moth species in the validation set were missed. A similar precision of 96% was obtained indicating that only 4% were wrongly classified.

Summary statistics
To evaluate the final system, including tracking and classification, the same observation was used as a test for the tracking algorithm. None of the images from this observation were used to train the    CNN model. In this way, the algorithm was by this way tested on data material it had never seen before. The observation represents a typically night, where 6,000 images were collected in the trap. The images were studied to establish a GT. This involved a manual count of occurrences for each moth  species. Table 4 shows the algorithm's estimate of the number of moth species and the established GT of the observation. Note that the two species Amphipyra pyramidea and Autographa gamma did not occur in this observation. There were also unknown moth species in the dataset. However, none of them were detected correctly. Table 5 shows the algorithm's precision, recall and F 1 -score based on the tracked and predicted moths in table 4. In total a precision of 80%, recall of 86% and F 1 -score of 83% was achieved.

Discussion
The tracker measured a Tracking Detection Rate (TDR) of 79% which means it tracked the majority of the observed insects correctly. However, the algorithm had a False Alarm Rate (FAR) of 22% which means that nearly one quarter of the detected tracks were incorrect. The test was made on a night with average insect activity, but with many challenges and variations of movements. The MCC algorithm measured an average combination of tracking and classification F 1 -score of 0.83 where recall was 6% higher than precision. This score was an accepted performance of the hole system. However, the final validation was only performed on 82 individual moths and only six of the eight species was represented.
The most frequent source of error were individual insects moving at the edge of the camera's field of view. This resulted in insects disappearing completely or partially from an observation only to potentially appear again later. Furthermore, errors occurred in cases where an insect flew close to the camera lens. In such cases, the algorithm could place multiple boxes on a single individual and make a match with these fake boxes. However, because the insects flying in front of the lens rarely stayed in the field of view for more than a few frames, the design of the tracker often prevented this type of error. An individual insect had to be detected in at least 50 images before it was counted. Consequently, a flying insect appearing in few frames was below the threshold filter value, and the final statistics were therefore not affected. The last identified source of error occurred when two individuals were located closely together in the image. In this case, the tracking algorithm could not separate the two individuals and therefore only placed one bounding box.
One of the primary sources of error in the algorithm was the dataset used for training and validation of the CNN model. One expert did a sample study of the dataset and concluded that especially the classes Caradrinina and Ampipyra pyramedia included individuals that were annotated incorrectly. This means that noise was introduced into the dataset and the CNN model must thus be expected to have had a deteriorated performance. Collecting a sufficiently large dataset with enough data points for efficient classification of the rarer species was also a significant challenge.
The current classification algorithm relies heavily on padding the bounding box found during blob segmentation. The performance of the system changes significantly with variation in padding before CNN prediction. The CNN algorithm was trained on a dataset using manual annotations of the moths. These do not surround the moths near as closely as the bounding boxes placed by the blob segmentation (see figure 6 and figure 9). Thus, there is a difference in the sizes. It is likely that accuracy could be improved with training on images cropped by the blob segmentation.
As a proof of concept, the proposed light trap and MCC algorithm is promising as an automated system for insect observations during night hours. In order to conduct better statistical analysis and draw empirical conclusions about the proposed system, there is a need for more recordings from different observations and locations. In addition, the CNN classification algorithm can be improved by ensuring that the dataset for training is annotated correctly.

Conclusion
In this paper, we have present a light trap and computer vision system to monitor live insects and moth species. The automated light trap was composed of a web camera, Raspberry Pi computer and special illumination to attract insects during the night. The camera and illumination was adjusted to achieve a high resolution and quality of images of eight different moth species. The light trap recorded more than 250,000 images during 48 night over the summer 2019 with an average of 5675 images per night.
A customized convolutional neural network was proposed and trained on 864 labeled images of live moths. This allowed us to detect and classify eight different species. The network achieving a high validation F 1 -score of 0.96.
The algorithm measured an average classification and tracking F 1 -score of 0.83 and a tracking detection rate of 0.79. This result was based on an estimate of 83 individual moths observed during one night with insect activity in 122 minutes collecting 6,000 images.
Furthermore, the paper identified potential improvements to the system where the amount of training data for the presented CNN model for species classification was highlighted as a focus area.
Overall, the proposed light trap and computer vision system showed promising results in nondestructive and automatic monitoring of moths and classification of species. It should be considered as a viable alternative to traditional methods which typically requires tedious manual labor (i.e. visiting the trap several times in a season for observation) and often rare species of insects are killed.