A Multilane Tracking Algorithm Using IPDA with Intensity Feature

Detection of multiple lane markings on road surfaces is an important aspect of autonomous vehicles. Although a number of approaches have been proposed to detect lanes, detecting multiple lane markings, particularly across a large number of frames and under varying lighting conditions, in a consistent manner is still a challenging problem. In this paper, we propose a novel approach for detecting multiple lanes across a large number of frames and under various lighting conditions. Instead of resorting to the conventional approach of processing each frame to detect lanes, we treat the overall problem as a multitarget tracking problem across space and time using the integrated probabilistic data association filter (IPDAF) as our basis filter. We use the intensity of the pixels as an augmented feature to correctly group multiple lane markings using the Hough transform. By representing these extracted lane markings as splines, we then identify a set of control points, which becomes a set of targets to be tracked over a period of time, and thus across a large number of frames. We evaluate our approach on two different fronts, covering both model- and machine-learning-based approaches, using two different datasets, namely the Caltech and TuSimple lane detection datasets, respectively. When tested against model-based approach, the proposed approach can offer as much as 5%, 12%, and 3% improvements on the true positive, false positive, and false positives per frame rates compared to the best alternative approach, respectively. When compared against a state-of-the-art machine learning technique, particularly against a supervised learning method, the proposed approach offers 57%, 31%, 4%, and 9× improvements on the false positive, false negative, accuracy, and frame rates. Furthemore, the proposed approach retains the explainability, or in other words, the cause of actions of the proposed approach can easily be understood or explained.


Introduction
Advanced driving assistance systems (ADAS) are no longer an optional or a luxurious component in modern vehicles [1,2]. Instead, they are becoming a core component, especially with the migration towards autonomous vehicles. ADAS covers a number of varying functionalities, such as lane departure warning (LDW), lane keep assist (LKA), lane change merge (LCM), adaptive cruise control (ACC), collision detection and avoidance (CD), night vision, and blind spot detection, to mention a few [1,[3][4][5][6][7][8][9][10][11][12][13]. The overall functionality of the ADAS is underpinned by a machine vision component whose ability to understand the surroundings, particularly the ability to extract lane boundaries and markings in roads. With ADAS becoming a core component, it is essential that potential errors arising out of the machine vision component be as low as possible. However, correctly, consistently, and constantly extracting lane markings across a range of weather conditions is not trivial. In addition to this, varying lane marking standards, obscure lane markings, splitting and merging of lanes, and shadows of vehicles and objects exacerbate this problem even more [14][15][16][17]. We show a number of such examples in Figure 1. road markings (adopted from [18]).
The road markings can be extracted using image-based sensors like monocular or stereo vision cameras, or using LIDAR sensors. Among these, using monocular cameras is the cost-effective approach, although they lack the depth information. Stereo vision cameras can, however, provide the capability to infer the depth information and hence the ability to reconstruct three-dimensional scenarios for improved functionality, such as collision detection [4]. LIDAR sensors exploit the fact that road markings are painted using retroreflective paints. These extracted markings can then be used to extract the lane markings. However, LIDAR sensors are, similar to stereo vision cameras, far more expensive than monocular cameras. As such, seeking a trade-off between performance, reliability, and cost is an important activity in the design process. Treating cost effectiveness as the primary objective, we assume that the lane detection is performed on images obtained from a monocular camera system.
The literature on lane detection and tracking is considerably rich with a variety of techniques, covering various applications domains, including LDW, LKA, LCM, and CD. Some of these perform lane marking detection (for example, [19,20]) and track them while the rest perform only the detection (for example, [5,13,21]). In particular, we focus on techniques that solely rely on images or videos obtained from monocular vision cameras for lane marking detection followed by tracking. For instance, vision-based lane detection has been used for LDW in [5,11,12,14,22]. These approaches predominantly rely on information such as color, color cues, and edge-specific details. Color cues exploit the color contrast information between the lane markings and roads. However, the conditions have to be favorable for the differences in contrast to be realized by the lane marking algorithms. Conditions such as illumination, back lights, shadows, night lights, and weather condi-tions, such as rain and snow, can significantly affect the performance of color-cue-based algorithms. One approach to overcome these limitations is to use the Hough transform along with color cues [23]. However, Hough transform works well when the potential candidate lines are straight and visible enough. Although some preprocessing can improve the detection [21], consistently differentiating lane boundaries from other artifacts, such as shadows and vehicles, is a challenge.
Inverse perspective Mapping (IPM) is another approach to determine the lane boundaries in LDW systems. The central idea behind IPM is to remove the perspective distortion of lines that are parallel in real world [11,18,22]. In order to do this, images are transformed from camera view to bird's eye view using camera parameters. During the transformation, the aspect ratios are retained so that gap or widths between lane boundaries are transformed appropriately. As such, the lane boundaries are still detectable in the transformed space. However, there are several downsides to this approach. Primarily, IPM is often used with fixed camera calibration parameters, and this may not always be optimal, owing to the surface conditions [24]. Furthermore, these transformations are computationally intensive [25], and as such, the real-time utility of these approaches needs careful implementation. Although these issues can reasonably be overcome by resorting to various techniques, such as calibration and adequate compute power systems [24][25][26], the main limitation is that the transformation is sensitive to obstacles on the road, such as vehicles, and to terrain conditions [27].
As lane markings are a pair of parallel lines, each pair should pass through a vanishing point [28]. This property can be exploited to eliminate and filter out the line segments that do not constitute lanes [29][30][31]. A number of approaches have been proposed in the literature for tracking a single lane, such as [14,17,[31][32][33][34][35]. In [17], color, gradient, and line clustering information are used to improve the extraction of lane markings. In [36], an approach for lane boundary detection based on random finite sets and PHD filter is proposed as a multitarget tracking problem. In [32], a multilevel image processing and tracking framework is proposed for a monocular camera-based system. As such, it heavily relies on preprocessing of frames. Our approach also uses splines, but our tracking approach is significantly different to the one in [32]. In [33,34], techniques for personalized lane-change maneuvering are discussed. They use driver-specific behaviors, collected as part of the system, to improve the results. Although this can improve the results, such approaches are practically difficult to implement. In [35], the lane tracking is simplified by forming a midline of a single lane using B-splines. Although this approach may be useful over a short distance, conditions such as diverging lanes or missing lane markings will render the approach susceptible to bad approximations of midlines. This can easily lead to suboptimal results.
With recent advances in machine learning, particularly with supervised learning techniques such as deep learning, it is possible to engineer machine learning models to recognize lane markings. This possibility has been demonstrated in the literature [37][38][39][40][41]. In [38], a special convolutional neural network (CNN), termed spatial CNN (SCNN), was constructed for extracting the spatial correlation between objects in an image with the view of using that to establish the relative positioning of lane markings. In [39], the LaneNet consisting of two deep neural networks was constructed for lane detection. One of the networks detects lane marking edges, whereas the other network groups and clusters lane markings. The lane extraction work described in [38] relies on several integrated techniques, such as the YOLO framework [42] for object detection and convolutional patch networks (CPN) [43,44] for detecting road surfaces and regions of interest. Although these supervised techniques can offer a good result, the approach suffers from a number of issues. First, supervised learning techniques rely on labeled datasets or ground truth information. Although this appear to be trivial, these labels have to be made for each and every pixel that are to be classified as lane marking. Second, the real success of deep learning is based on the volume of data upon which the model is trained. Although the process of securing several thousands of images with labeled pixels can be automated, it is a time-consuming process. Third, training requires substantial amount of compute time. Fourth, although various supervised learning techniques can offer good accuracy rates, the explainability of the machine learning models is still an upcoming area of research, and unlike general algorithms, deep neural networks lack the rigor of explainability. This is a serious concern where lives could be at risk. Finally, the accuracy rates are never sustained across different datasets. As such, the training process is a continuous one.
This paper aims to develop a tracking technique for a multilane tracking problem based on images/videos captured from a single camera mounted in front of a vehicle. The algorithm is designed to be real-time, robust, and cost-efficient in terms of sensors. To this end, we first model each lane marking as a spline, defined by a finite set of control points. By treating these splines (and thus the control points) as targets whose motions are defined during frame transitions, we develop a multitarget tracking problem with an appropriate motion model. The multilane tracking technique proposed in this paper is a precise amalgamation of several existing real-time and robust ideas in the pipeline with the addition of certain new ideas mentioned in the contribution.
We utilize the probabilistic Hough transformation [45] to perform an initial extraction of lane markings. This is then followed by a series of algorithms prior to treating the extracted lanes as targets. The first algorithm in the pipeline performs an initial grouping of extracted line segments into different splines. This is then followed by an algorithm, which encapsulates a number of subalgorithms, to differentiate the clutter from lane boundaries in a robust manner and to manage the evolution of trajectories of splines being tracked. We then devise a multitarget tracking algorithm based on a motion model that assumes that the transitions of splines across frames are at a constant rate. The overall solution can be considered as a carefully engineered pipeline of algorithms. In doing this, we make the following key contributions:

1.
We develop an algorithm, based on a maximum a posteriori (MAP) estimator [46], to group and cluster different lane segments into unknown spline groups; 2.
find intensity likelihood ratio of line segments and augment this ratio as a feature in a clustering and probabilistic data association (PDA) filter [47] to distinguish lane markings from clutter; and 3.
propose a new, real-time, multiple extended target tracking (targets that shape and position changed simultaneously) algorithm that works with clutter existence based on the PDA filter to distinguish and track multiple spline shape lane-lines.
The remainder of this paper is organized as follows: In Section 2, we formulate the overall problem, and discuss our approach for solving each of the subproblems. This is then followed by Section 3, in which we discuss a set of preprocessing steps on the input images prior to using our framework of methods. The aspect of clustering and estimating control points to describe the splines, and two of our key algorithms for this purpose are discussed in Section 4. We then describe the techniques to track multiple splines using the IPDA filter in Section 5. The results of our evaluations are then presented in Section 6, and we discuss conclusions in Section 7.

Problem Formulation
To facilitate the process of deriving an overall approach and suitable algorithms, we use i as the index for the control points i ∈ 0, 1, . . . , N, j as the lane index, and k as the frame index. For instance, the parameter x i,j,k denotes the ith control point for the jth lane on the kth frame. The notations used in this manuscript are given in Table 1.
With these notations in place, the overall problem of lane identification across a sequence of frames can be reformulated as follows:

1.
Identification of Control Points: For a set of extracted lane markings on frame k, identify a set of control points that would uniquely describe each of the lane markings (assuming that lane markings do not cross each other); 2. Trajectory Management: Each candidate control point belongs to one of the lane markings, and over a period of time (across frames), these control points form distinctive trajectories if they are managed well. In order to manage these trajectories, which is a prerequisite for multitarget tracking, it is crucial to associate the control points to the trajectories they belong to; and 3.
Multitarget Tracking: By using control points in each of the trajectories as pseudo measurements, formulating a multitarget tracking algorithm is essential for further extraction and identification of lane markings on the frames yet to be seen. Gating probability e k,j Event on the jth spline at the kth frame Ψ k,j Line segment measurements for lane-line j

Our Approach
In addressing the overall problem outlined in Section 2.1, we decompose that into a number of subproblems, each of which handles a specific aspect of the overall lane detection problem across frames. The overall agenda is to form an automated processing pipeline, where each stage of the pipeline is underpinned by one or more algorithms. This processing pipeline is shown in Figure 2. Each of these stages is discussed in the following sections.

Preprocessing
The key aspects of the preprocessing stage include edge detection, probabilistic Hough transform, and extraction of region of interest. We also use noise filtering before each of these stages to minimize the impact of noise amplification in the process.

Edge Detection
The basic edge detection in images is based on the convolution of a predetermined kernel with an image [48]. In our case, each frame forms an image. However, this basic approach for edge detection, which is a gradient finding exercise, picks up the gradients of the noise along with the lane markings. Although basic noise filtering, such as averaging or median filtering, can minimize these effects, they do not guard the edge detection from these artifacts. For this reason, we used the Canny edge detection [48], which incorporates Gaussian filtering as a precursor step to gradient calculation. More specifically, we used two 3 × 3 kernels, namely a Gaussian kernel H and an edge detection kernel K. For each input frame F in , we calculated the output frame F out as F out = K * (H * F in ), where F in denotes the noise-filtered version of F in , and * operator denotes the convolution operation. We have also prefixed the values of the H (by fixing the variance).

Probabilistic Hough Transform
Although edge detection process brings out the edges in each frame, they do not have to correspond to straight lines in the real image, particularly the lanes in the partitioned tiles. In other words, among many pixels forming different edges, the key interest is on pixels that make up straight lines-lanes. An easier implementation of this is due to [49], where an accumulator matrix captures the intersections of various straight lines in an image. This matrix is then exhaustively searched for a maximum (and other decreasing maxima) to find straight lines. As such, the original Hough transformation process is computationally intensive. In our case, we adopted the probabilistic version of Hough transform [50], where only a subset of the edge points are selected through random sampling process, particularly when updating the accumulator matrix. With this approach, we reduce the amount of computation without considering all possible measurements.

Extraction of Regions of Interest
Although the probabilistic Hough transform can filter out unnecessary edges and lead to straight lines, the extracted straight lines do not have to represent only the lanes. In fact, the extracted straight lines can be anything, including lanes, edges of the vehicles, lampposts, and buildings. An easier approach to filter out irrelevant components is to use the vanishing points. Each pair of lane, unlike other straight lines in a frame, should have a vanishing point.
Vanishing points can be extracted by embedding an additional process after the probabilistic Hough transform process outlined above. Sinusoids that pass through all of the maxima points in Hough space should correspond to the vanishing point in the image plane. In particular, we extract vanishing points for each partition of the image. We then use these vanishing points to eliminate irrelevant straight line segments in the image and to form regions of interest. In addition to this, the area outside the vanishing line has no information that can aid lane boundary tracking, and can be removed.
In Figure 3, we show the outputs of different stages of the preprocessing.

Clustering and Identification of Control Points
Identification of lane-line (spline) control points starts with:

1.
Partitioning frame, finding line segments and likelihood ration.

2.
Predict control points and validated measurements.

3.
Update final control points using MAP estimator.

Frame Partitioning
Once the preprocessing is over, the next stage of the pipeline extracts the control points. Although we intend to identify a set of control points to model the lanes as splines, the process is much simpler if the splines are small in size and straight in shape. However, the extracted lane markings are seldom straight. One approach to address this issue is to partition each frame into n horizontal tiles, each with an experimentally determined height, so that lanes on each partition are near straight. Figure 4 shows the same image partitioned in two different ways: for two different values of n (namely n = 3 and n = 4), and with different partition heights.
However, considering the perspective characteristics of the camera and the distance of lanes from the camera, it is beneficial to have the heights of the partitions in increasing order toward the bottom of the frame. We experimentally determined that the extracted information is maximized for n = 3, such that h 1 = 1 7 H, h 2 = 2 7 H, and h 3 = 4 7 H, where H is the overall height of the region of interest (ROI). We use this configuration with the values of n and h i (i = 1, 2, 3) throughout the study conducted in this paper.

Intensity Likelihood Ratio of a Line Segment
For each of the partitions, we apply the probabilistic Hough transform to extract the lane markings. However, the extraction process, akin to edge in most of the detection techniques, produces a number of broken, small, noncontinuous, and irrelevant line segments. As such, one of the key challenges following the extraction process is to distinguish the lane markings from background noise and clutter. To render a more robust high-fidelity approach toward clutter and noise management, we augment the extractions with underlying intensity values. More specifically, we define the number of edge points that lie in an extended line segment s (s = 1, . . . , n) as the intensity. The intensity can be extended to cover a set of line segments or a number of pseudomeasurements belonging to a curve. The intensity of an extended line segment is represented as a likelihood ratio, which we define below.
Let p 0 ( f j ) be the probability density function (PDF) of the noise only, and p 1 ( f j ) be the target originated line-segment detections before thresholding. Furthermore, let D 0 and D 1 be the scale parameters for false alarms and clutter, and target, respectively. These scale parameters are dependent on the minimum number of points used in the Hough transform. The noise only and target originated measurement density functions are where f j ≥ 0 is the intensity of the candidate measurements j. Furthermore, let γ = γ det be the threshold to declare a detection. The probabilities of detection (P D ) and false alarm (P FA ) can be computed as follows: Although the probability of detection, P D , can be increased by lowering γ, it will increases P FA . Hence, the choice of γ cannot be arbitrary. With these, the corresponding probability density functions after thresholding become where p γ 0 ( f j ) and p γ 1 ( f j ) are the probability density functions of the validated measurement ψ j (for j = 1, . . . , m) that are due to noise only and originating from the target, respectively.
Considering Equations (5) and (6), the line segment intensity likelihood ratio e j , which is the likelihood ratio of measurement ψ j with intensity of f j edge pixels originating from target rather than clutter, can be defined as

Pseudomeasurements
Pseudomeasurements Z τ k,j is a set of the control points for track τ and lane j in frame k. More specifically, Furthermore, let ψ k s (j) denote the extended line segment in section s, at time step k, for lane j-that is, ψ k s (j) abstracts away a number of pseudomeasurements for each (extended) line segment. Each such measurement is a two-element vector, with one capturing the pseudomeasurement Z τ k,j and the other one representing the intensity of the extended line segment as a likelihood ratio, e j (k).

MAP Estimator for Measurements with Intensity Feature
Although we expect the pseudomeasurements to almost model the lane-lines, in reality, a number of factors make this process as challenging. Examples include, but are not limited to, missed detection, nondeterministic nature of the preprocessing, and noisy measurements due to clutter. Therefore, it is essential to model these imperfections as part of the process.
To simplify the analysis and derivation, we assume that measurements that originate from targets at a particular sampling instant are received by the sensor only once with probability of detection P D . The measurement equation can be written as follows: where j = 1, . . . , m, w(j) is the measurement noise, and x = [x 1 , x 2 ] is the unknown value that we are aiming to estimate in the presence of the measurement noise. We also assume that the measurement noise is independent and zero mean Gaussian distributed with covariance R. In our case, various preprocessing stages, such as thinning and Hough transform, contribute towards R. Thus, w(j) ∼ N (0, R), where Because of the condition of the road and perspective effect of the camera lens for values of σ 2 11 and σ 2 12 , we would expect more deviation in the bottom part that is closer to the camera compared to the top. We also assume the measurements ψ to be normally distributed around x with covariance R and the prior probabilities p(x) to be normally distributed around the predicted measurementx with a covariance Q. Thus, ψ ∼ N (x, R) and Again, similar to R, the perspective effects of the camera influences the values of σ 2 01 and σ 2 02 to be skewed toward the bottom part of the frame. Furthermore, the covariance Q is often linked to the curvature κ of the road. Assuming the maximum standard curvature of highways as a constant parameter, the posterior measurement density would be Since the measurement and prior noises are assumed to be Gaussian, for a single measurement, (i.e., m = 1), ψ(1) = ψ 1 can be expressed as: where For a Gaussian distribution, the mean is the optimal maximization valuex. Hence, For m > 1, the optimal maximized value can be derived using the total probability and combined residuals as follows: where β j is association probability, which we define as (see Appendix A for derivations).
where P D and P G are probabilities of detection and gating, respectively, m k is the number of validated detections at time k, e j is the intensity of the extended line segments as a likelihood ratio, and L j is the probability density function for the correct measurement without the intensity feature, defined as where S = R + Q, andx is the prior information.

Clustering Algorithm for Finding Control Points
Ideally, each partition will have a sufficient number of full measurements ψ k (j) so that a spline can be fitted over those measurements. However, in reality, this is seldom the case. The associated challenges are dealt with here using an algorithm that estimates the control points based on the available set of measurements. In particular, we use the MAP estimator (MAPE) to find the optimal control points. These aspects are handled by two algorithms, Algorithms 1 and 2, which are outlined and discussed in detailed below.
Algorithm 1 handles each partition separately, but by extending the line segments into the next partition wherever needed. For a given partition s, it estimates the control points for each line, x i,s , using the curvature κ. Then, the overall set of lines L is used to estimate the control points for that partition using the MAP estimator (see Algorithm 2). These control points are accumulated into A as a list. Notice that the Predict() function finds predicted control points for each individual line segment l using curvature vector κ. S i : Set of partitions indexes eliminating i 10: for i=1; i<N; i++ do 11: for each l ∈ ψ i do 12: for each S i ∈ {1..N} − {i} do 13: Inputs:x, ψ, R, P β j ← L j e j C 14: end for 15:x =x + R R+Q ∑ m j=1 β j r j Algorithm 2 combines both the data association and the posteriori PDF to adjust the estimated control points. In particular, it uses the IPDA-specific target-to-track association probabilities (covering both the track existence and non-existence), β 0 and β j for finding the control points based on candidate control pointsx and measurements Ψ. More specifically, Validated-Measurements() function uses normalized distance to validate the line segments belonging to each spline. We show a sample outcome of these algorithms in Figure 5. We first show two endpoint measurements (ψ 1 1 , ψ 1 2 ) and (ψ 2 1 , ψ 2 2 ) (Figure 5a). These points are then corrected using the above algorithms to output corrected control pointsx 1 1 andx 1 2 (Figure 5b).

Multilane Tracking Using IPDA
PDA filter is similar to Kaman filter in the dynamic model and prediction, but in the update step, it uses the sum of weighted measurements to deal with clutter. We used integrated PDA (IPDA) [51] filter in the multi-target platform in a new way to combine initialization and track management for tracking multiple unknown numbers of extended targets (splines) with augmented intensity as a new feature inside association likelihood to deal with clutters, Figure 6.

Preliminaries
As stated above, we assume that control points are moving at constant velocity, and thus our dynamic model is a constant velocity model. With this, our state vector for track τ in frame k becomes where With these, the state evolution and measurement updates for frame (time) index k become and Z τ, f where F, G, and H are state transition, control input, and and observation matrices, respectively, and ν and ω are measurement and process noises, respectively. In our case, we preset F and G to To minimize the variability of the outcomes of the lane extraction process, or its dependencies on the highly variable surrounding contexts, the selection process of the control points has to be adaptive enough to account the context information. For instance, when comparing the lane extraction process on roads with varying terrain conditions opposed to highways, it is desirable if the process noise is more amenable to high variance on the spatial distribution of control points. One approach to bring the adaptiveness into this process is to introduce adaptive process and measurement noises. Furthermore, due to partitioning of frames and due to the perspective effects of the camera, control points on the bottom most partition are closer to the camera than the control points on the top-most partition of a frame. As such, despite the constant velocity model, the rates of the spatial variations of control points are different. Considering different noise variance associated with the acceleration of control points will address this issue. As such, the diagonal form of the process noise ν k can be expressed as whereẍ τ i,k for (i = 1, 2, 3, 4) are the maximum acceleration of each control point. The process noise covariance Q can be expressed as where and where diag i (Q) is the ith subdiagonal of Q, with i = 0 denoting the main diagonal, −i and +i denoting ith subdiagonals below and above the main diagonal, respectively. Similarly, the measurement noise covariance R is

Multilane Tracking Using IPDAF
The IPDA filter [51,52] offers an augmented information towards track maintenance apart from state estimation of tracks. Instead of assuming the existence of targets as a hardwired probability, the IPDA filter offers a choice incorporating the track quality measure into the tracking process.
In the context of PDA algorithm, for a each validated measurement, the association probability (with each track) is calculated. To this end, the association probability β τ i (k) accounts for the probability of associating a measurement i to track τ k , feature intensity of f τ i (k), and the likelihood ratio of associating a line with a feature measurement e τ i (k). These probabilistic information are used to associate new measurements to the targets. Given a linear dynamic model, and an IPDAF based on [47], the state and measurement equations becomesx (29) wherex i (kk) is the updated state conditioned on the event that the ith validated measurement being correct, and β i (k) is the probability of associating a measurement i with feature value of f i (k) to track k. The association probability for a set of m k gated or validated measurements with features f i (k) can be expressed as where i (k) is the event described in Appendix A. In our case, we assume that each detected lane boundary measurement i has a feature of intensity f i (k). With reference to (A13), the feature likelihood e i (k) can incorporated into the PDA algorithm as follows [53]: where i = {0, 1, . . . , m(k)}. The overall IPDAF algorithm here embodies a traditional PDAF algorithm with special initialization and the termination steps. This is outlined in Algorithm A1 in the Appendix B. The underlying aspects of the multilane tracking follow the principles of multitarget tracking as in [47], and are discussed in the following subsections.

Track Initialization
The track initialization process (for each track) can rely on one or two seed points. In the one-point initialization method, position can be initialized from a single observation with zero velocity vector. Due to [54], This initialization allows the standard gating to be used during the following time step.

Measurement Prediction
For each track, the state vector, the measurements, and the state covariance matrices are predicted ahead as in the standard Kalman filtering. i.e., with the assumption of Gaussian posterior for p(x) as

Measurement Gating
For each track, a validation gate is setup around the predicted measurement to select the candidate measurements for the data association. The size of the validation gate is correlated to the innovation covariance and the measurement noise. As per (38), at most, one of the validated measurements can be assigned to the target. The measurements outside the gate are assumed to be false alarms or measurements belonging to other targets. The validation region is the elliptical shape as follows: where n z is the dimension of measurement vector representing the degrees of freedom, and γ is the gating threshold. Here, the gating threshold γ is a chi-squared distribution, parameterized by the probability of gating P G , and by the degrees of freedom n z [46].

Data Association
An incoming measurement at time index k, z i (k) with feature f i (k) is associated to track a τ, based on the association probability given by (31) Here, there are two likelihood ratios of interest: L i (k) and e i (k). The former is the likelihood ratio of the incoming measurement z i (k). This is defined as where λ is the uniform density of the location of false measurements. The second parameter of interest, e i (k), is the likelihood ratio of measurement z τ i with feature f τ i of the track τ. This is defined as Both of these measurements, z i (k) and z τ i , are expected to originate from the target and not from the clutter.

State Update
The state, gain, and covariance update equations of the PDAF arê P(kk) = β 0 (k)P(kk − 1) + (1 − β 0 (k))P c (kk) +P(k) (44) where W(k) is the Kalman gain, and V (k) is the combined innovation defined by where m k is the number of measurements inside the gating region, and β 0 (k) is the probability of all measurements being incorrect at time index k. With no information on which of the m k measurements being correct or incorrect, the correct updated covariance can be expressed as

Track Management
During the course of tracking, several tracks are maintained in parallel, and their states are continuously updated upon receiving measurements. A track can be in three different states: tentative, confirmed and terminated. During the initialization phase, every unassociated measurements will form a tentative track. However, upon following detections or measurements, and gating operations, the tracks will begin to form and their status will be updated as confirmed. However, if no further measurement to track associations are possible, a possibility when no detections are observed from the target responsible for the track, corresponding track is terminated. For the case where P D < 1, it is essential to check the quality of measurement to track association before engaging in track status update. One of the approaches for assessing the quality of measurement to track association is the goodness of fitting. The goodness of fitting, often represented by the log-likelihood function of the track, can be expressed as a recursive function as follows [46]: where V (k) is the innovation matrix, and S(k) is the covariance of the innovation matrix V (k). The last term in (49) has a chi-squared density with n z degrees of freedom, where n z is the dimensionality of the measurement vector. As the innovations are independent, the log-likelihood function at time k is a chi-squared distributed with kn z degrees of freedom. This is actually a measure of the goodness of fit to the assumed target model. Thus, the test function for keeping (or terminating) a track can be expressed as where the tail probability α is the probability that a true track will be rejected. In our case, this threshold is around 0.01. However, the actual state transitions are performed through more rigorous checks. In our case, we maintain a number of (hidden) measurement and association counters for each track. These counters are used to assess the activeness of the measurement-track association.

Experiments and Evaluations
The proposed algorithm has been evaluated against two different baselines: modeland machine-learning-based implementations. The proposed algorithm was implemented using the OpenCV library in C++. The algorithms were tested on a system with Intel i7 CPU, clocked at 2.9 GHz with 16 GB RAM.

Evaluation against Model-Based Approaches
The proposed approach was benchmarked against two model-based methods, namely [18,55] using the Caltech Lane dataset [18]. The dataset has four video clips taken at different times of a day around the urban area of Pasadena in California. Each of these video clips has a resolution of 640 × 480, and covers various lighting and illumination conditions, writings along with lane markings (Clip#1), sun glint and different pavement types (Clip#2), shadows and crosswalks (Clip#3), and congested settings (Clip#4). As such, they are reasonably representative enough of various challenging conditions for tracking lane markings. In total, 1224 frames and 4399 lane boundaries were processed. The details of these video clips are given in Table 2. Notice that ROI parameters needs to be set in our method similar to [18,55]. Our algorithm does not need camera parameters, but since [18,55] use IPM mapping, they need those parameters to be set.
During evaluation, we computed the true and false positive rates (TPR and FPR, respectively), where the TPR is the ratio of the number of detected lane boundaries to the number of target lane boundaries and the FPR is the ratio of the number of false positives to the number of target lane boundaries. The frames were processed at the rate of seven frames per second similar to that of other methods in the literature [18,55]. In addition to TPR and FPR metrics, we also included another metric, false positives per frame (FP/frame or FPF), which is an average of false positives across all frames. One could equally consider the true positives per frame rate as well. We used the TPR, FPR and FP/Frame as the metrics of evaluation on for model-based approaches. The results of the evaluation are shown in Table 3.
Noting that higher TPR, lower FPR, and lower FP/frame values are desirable, we highlight the best (boldface) and second best results (underlined) in the results outlined in Table 3. A number of observations can be made here: • The TPR performance of the proposed algorithm is consistently higher than those of the other two algorithms throughout all video clips. The performance of the proposed algorithm over Clip#3 is significantly higher than that of the method in [55]; • The FPR performance of the proposed algorithm is always better than that of the method in [18]; • The FPR performance of the proposed algorithm performs better than that of the method in [55] except for Clip#4. One potential reason for this sub-optimal performance for this clip can be attributed to the difficulties in association in congested settings; and • The FP/Frame performance is mixed across the cases.
In overall, the proposed approach outperforms the other two methods across all cases. The overall performance differences are 5%, 2%, and 2% for TPR, FPR, and FPF cases when compared against the second best version, namely the method in [55]. To ensure the operation, we manually analyzed the datasets to label all visible and invisible lane markings as target lane markings.

Evaluation against Deep Learning-Based Approaches
In this setting, we evaluated the proposed method against the spatially convolutional neural network (SCNN) method [38] using the TuSimple data-set [56]. The TuSimple dataset has about 7000 one-second-long video clips, each with 20 frames. The ground-truth information is available only for the last frame (frame 20) of each clip, including height and width values corresponding to lanes. Although the TuSimple dataset includes a number of road conditions, such as, straight lanes, curvy lanes, splitting and merging lanes, and shadows, we used only the straight and curvy lane conditions. Notice that like all fully supervised models the algorithm in [38] uses the trained parameters coming from training process, but we only use parameters for ROI in preprocessing step.
In addition to using TP and FN, we also use the accuracy and inference time as additional metrics of our evaluation. The overall results are shown in Table 4.
From these results, it can be seen that the proposed method outperforms the baseline method across all metrics. More specifically, the proposed method offers additional 4% improvement in accuracy along with ninefold speedup.

Conclusions
In this paper, we proposed a novel approach for multilane detection. By using the intensity feature in conjunction with the probabilistic Hough transform, we first formulated an algorithm for detecting and correctly grouping multiple lane markings. By using these lane marking as splines, we then identify a set of control points, which then get tracked over time across frames. Our evaluations, covering both model-based and machine-learningbased approaches show that it can easily outperform the model-based approaches while being suboptimal compared to the deep-learning-based approaches. However, there are a number of issues remain to be addressed. For instance, machine learning models do not provide any explanation or reasons for their decisions compared to filter-based approaches like one presented here. As such, the proposed approach embodies sufficient explainability for its actions. Further investigations are needed to establish how the performance of the proposed approach can be improved.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Association Probability for Lane Measurements
Assuming there are m detections at time k, when finding the association probability β j between a measurement and a target (lane j), the overall event (j) is comprised of two mutually exclusive events: either (j) is such that the measurement ψ(j) is from the target, for j = 1, . . . , m, or all measurements are false.
Here, the association set {β j } is defined as the probability of the events { j } given all the measurements Ψ, β j = p{ j Ψ}. In this appendix, it is shown how β j is related to the feature likelihood ratio of the measurement, e j .
Let the set of validated measurements at time k be denoted as Ψ k = {ψ k (j)}, for i = 1, . . . , m k . In general, the following criteria must be satisfied: where the γ is chi-squared distributed with n z degrees of freedom. The association probability of β j for a set of gated measurements can then be stated as β j = p{ j Ψ, m k }. Using Bayes's rule, this can be expressed as Assuming that the false measurements are uniformly distributed within the validation region, and the correct measurement location is assumed to be Gaussian with meanx and covariance of S = R + Q, the probability density function for the correct measurement without the intensity feature is where P G is the gating probability. Since the intensities are independent of location within the validation region, the probability density function of a single correct measurement, including the intensity feature, can be expressed as the product of the intensity likelihood ratio with L j as follows: p{ψ j , m k } = P τ 1 ( f j )L j (A4) and for incorrect measurements where V is the volume of the validation region so for the all measurements, the probability density function is

Appendix B. IPDAF-Based Lane Tracking Algorithm (Detailed)
The overall IPDAF algorithm using the PDAF algorithm with special initialization and the termination steps is outlined in Algorithm A1 below.
Algorithm A1 IPDATracker. Γ : Composite data structure for tracks 7: Λ : A copy of Γ 8: Ω : A set containing associated tracks 9: α : A temporary variable 10: α T : A temporary set of tracks 11: Λ = Γ.Tracks 12: Ω ← ∅ 13: if Λ.size() > 0 then 14: for i=0; i<size(Γ); i++ do 15 if V.size() > 0 then 22: for j = 0 to V.size() do 23: L i k ← Equation (40) 24: e i k ← Equation ( α T ← mergeTracks(Λ) 46: Γ.Tracks ← cleanTracks(α T ) 47: else 48: Γ.NonAsscociated ← Ψ 49: end if 50: return Γ The algorithm assumes a composite data-type that is capable of capturing the evolution of the states over a period of time. As such, we use the dot notation to extract these properties. For instance, in our algorithm we use Γ as a variable that has all the information of all the tracks being considered in the problem. Various properties, such as track type and track ID (a unique identifier for each track), are extracted using the stated dot notation. Furthermore, the algorithm assumes the presence of the following auxiliary functions: • getTrackType(): which returns the type of the track i, as temporary, retired or active; • getTrackID(): which returns the unique identifier for the track; • mergeTracks(): which fuses the supplied set of tracks; and • cleanTracks(): removes dead tracks from the list.