Non-Target Structural Displacement Measurement Using Reference Frame-Based Deepflow

Displacement is crucial for structural health monitoring, although it is very challenging to measure under field conditions. Most existing displacement measurement methods are costly, labor-intensive, and insufficiently accurate for measuring small dynamic displacements. Computer vision (CV)-based methods incorporate optical devices with advanced image processing algorithms to accurately, cost-effectively, and remotely measure structural displacement with easy installation. However, non-target-based CV methods are still limited by insufficient feature points, incorrect feature point detection, occlusion, and drift induced by tracking error accumulation. This paper presents a reference frame-based Deepflow algorithm integrated with masking and signal filtering for non-target-based displacement measurements. The proposed method allows the user to select points of interest for images with a low gradient for displacement tracking and directly calculate displacement without drift accumulated by measurement error. The proposed method is experimentally validated on a cantilevered beam under ambient and occluded test conditions. The accuracy of the proposed method is compared with that of a reference laser displacement sensor for validation. The significant advantage of the proposed method is its flexibility in extracting structural displacement in any region on structures that do not have distinct natural features.


Introduction
Structural health monitoring (SHM) increases a structure's lifetime and ensure its safety; the continuous monitoring provided by SHM allows for early-stage damage detection and downtime reduction, as well as potentially preventing failure during operation. For efficient monitoring, accurate and precise acquisition of structural response data is critical for condition assessment and decision-making that requires processed data. Structural displacement is one of the most important SHM factors when evaluating a structure's condition; traditionally, displacement is measured directly using a linear variable differential transformer (LVDT). One end of the LVDT is fixed to the structure and the other is attached to a stationary reference such as a scaffold. If the reference is fixed and stable, displacement can be measured to within a few micrometers. However, the use of LVDTs is hindered by practical difficulties in installing a reference point [1,2]. Hence, measurement is limited to only several points on a structure. A laser Doppler vibrometer (LDV), another direct measurement method, can provide high-resolution noncontact displacement data [3,4] but is cost-inefficient and restricted to measuring displacement in the direction of the emitted laser. features [25][26][27][28][29][30][31][32][33][34][35][36][37] for tracking displacement and system identification [38][39][40][41][42][43]. However, detection of a structure's natural features can be very difficult due to a lack of contrast and background conditions. To address these issues, the Kanade-Lucas-Tomasi (KLT) tracker [44,45] is widely employed for non-target-based displacement measurement, as it detects features like bolts and edges based on the magnitude of the image gradient. Once features are detected, the KLT algorithm calculates optical flow [46,47], which is the velocity field of features across two input images-displacement is then obtained by integrating the Lucas-Kanade optical flow [47]. The performance of KLT tracker in structural displacement measurement has been validated experimentally [28,42,48,49]. For example, a virtual vibration monitoring measurement was proposed to track feature points in selected regions on a structure to track multiple points simultaneously [48]. System identification of a shear building structure was conducted by measuring displacement in multiple regions using a Harris corner feature detector and KLT tracker [49]. However, two main challenges remain in implementing KLT tracker: the disappearance or incorrect tracking of feature points resulting from a low gradient in the given image.
This paper proposes a novel method of non-target structural displacement measurement. The proposed method uses Deepmatching and Deepflow to find dense correspondence between two images frames and calculate pixel-wise optical flow at the point of interests (POIs) to measure the displacement from images with sparse feature points. Additionally, the reference frame-based displacement method is proposed to drift-free displacement. The proposed method is validated through an experiment on a cantilever beam. The remainder of this paper is organized as follows: Section 2 briefly reviews the optical flow and KLT tracker that are most widely used for displacement measurement. In Section 3, the proposed method-including POI selection via masking, Deepmatching, Deepflow, reference frame-based displacement measurement and signal filtering-is explained. Section 4 describes experimental validation of the proposed method conducted on a cantilever beam under occluded conditions. Finally, Section 5 presents conclusions drawn based on the experimental validation.

Optical Flow
Optical flow [46,47] refers to a local displacement vector field of object motion between two consecutive frames, which occur because of movement by the object or the camera. Calculating optical flow requires two basic assumptions: (1) brightness constancy, which assumes that the pixel intensities of an object in an image do not change between consecutive frames, and (2) small motion between consecutive images. If a pixel in image frame I (x, y, t) moves by distance (dx,dy) in the next frame, taken after a period of time dt, the following equation can be applied under basic assumptions of optical flow: I(x, y, t) = I(x + dx, y + dy, t + dt) Expanding the first term of Equation (1) using the Taylor series, the following equation can be obtained by removing higher-order terms and dividing by dt: Combining Equations (1) and (2) where V x and V y are components of the velocity or optical flow of I(x,y,t), and I x ,I y and I t are derivatives of the image at (x,y,t). Equation (3) is called the optical flow equation.

Lucas-Kanade Method and KLT Tracker
The Equation (3) has two unknowns and thus information from a single point in an image frame is not sufficient to accurately determine the optical flow vector. The Lucas-Kanade tracker assumes that there is a region of interest (ROI) in which all points have the same constant optical flow vector that satisfies the following equations: where i is the index of the pixels in the window. Note that the first term on the left-hand side is a Hessian matrix in the first image, I (x,y,t), which affects the stability of the solution to Equation (4)-the inverse of the Hessian can become a singular matrix if the minimum eigenvalues are very small. The main idea behind the KLT tracker is to find only good features such that the inverse of the Hessian become nonsingular and reliable tracking can be performed. The KLT tracker allows for fast computation of optical flow, as only sets of good features are tracked across frames. However, due to the sparsity of the feature points, tracking accuracy heavily relies on the quality of feature points, which may change in appearance over time due to movement. Also, as the displacement is obtained by integrating optical flow V from two subsequent images, small errors may accumulate and result in drift.

Overview
Despite the fast computation of KLT, its application to displacement monitoring using KLT tracker suffers from long-term drift and loss of feature points due to object occlusion and movement. The proposed method is composed of Deepflow [50] to obtain pixel-wise optical flow, and POI selection to extract displacement of interest. In addition, the proposed method resolves displacement drift by calculating optical flow for the incoming image frame in reference to the initial reference frame, yielding a direct displacement field without the numerical integration of optical flow that is implemented in the KLT method. An overview of the proposed structural displacement measurement method is illustrated in Figure 1. Calibration is conducted in the first stage to compensate for lens distortion. During initialization, the initial image frame is set as a reference and POIs that denote the pixel coordinates for displacement measurement are defined using a mask. Once the reference frame and POIs are set, the optical flow between the reference and subsequent input frames is computed using Deepmatching [51] and Deepflow [50], tracking the movement of POIs. The measured displacements are filtered to eliminate noisy measurements and then averaged to provide high-accuracy displacement results. frames is computed using Deepmatching [51] and Deepflow [50], tracking the movement of POIs. The measured displacements are filtered to eliminate noisy measurements and then averaged to provide high-accuracy displacement results. The DJI Phantom 3, a commercial grade UAV, was used for the test. The camera installed on the UAV has 1080p resolution and frame rate of 25 fps. Also, the gimbal that holds the camera to the UAV embeds an accelerometer and gyroscope to stabilize the motion of the camera against 3-axis rotation. The UAV was hovering by maintaining 2m distance from the structure to avoid collision.
As a reference, a stationary camera, an LG smartphone G3, was installed on the ground 1m away from the structure, to extract the absolute structural displacements. In addition, accelerometers were deployed on each story of the shear building model, to compare the system identification results.

Camera Calibration
Camera calibration compensates for image errors induced by lens distortion and viewing position by identifying parameters for intrinsic, extrinsic, and distortion coefficients. These parameters can be identified by employing a pinhole camera model to precisely estimate scale factor λ, intrinsic matrix K, translation vector T, and rotation matrix R.
[ | ] the scale factor links pixels to corresponding distances in global coordinates. The intrinsic matrix K is related to the camera's intrinsic properties, including focal lengths, the skew parameter, and the principal point. Extrinsic parameters are related to the physical position of the camera's view, and include the rotation matrix and translation vector. The proposed method calibrates internal parameters in the lab to compensate for distortion and obtain a scaling factor by comparing the number of pixels of a target structure in an image frame with the corresponding actual distances. The rotation matrix and translation vector are assumed to be a unit matrix and zero vector because the only one stationary camera was used in experimental validation.

POI Selection by Masking
Estimating dense optical flow using correspondence in high-resolution input image or video is computationally expensive. Cropping the input image and selecting POIs as preprocessing can The DJI Phantom 3, a commercial grade UAV, was used for the test. The camera installed on the UAV has 1080p resolution and frame rate of 25 fps. Also, the gimbal that holds the camera to the UAV embeds an accelerometer and gyroscope to stabilize the motion of the camera against 3-axis rotation. The UAV was hovering by maintaining 2m distance from the structure to avoid collision.
As a reference, a stationary camera, an LG smartphone G3, was installed on the ground 1m away from the structure, to extract the absolute structural displacements. In addition, accelerometers were deployed on each story of the shear building model, to compare the system identification results.

Camera Calibration
Camera calibration compensates for image errors induced by lens distortion and viewing position by identifying parameters for intrinsic, extrinsic, and distortion coefficients. These parameters can be identified by employing a pinhole camera model to precisely estimate scale factor λ, intrinsic matrix K, translation vector T, and rotation matrix R.
the scale factor links pixels to corresponding distances in global coordinates. The intrinsic matrix K is related to the camera's intrinsic properties, including focal lengths, the skew parameter, and the principal point. Extrinsic parameters are related to the physical position of the camera's view, and include the rotation matrix and translation vector. The proposed method calibrates internal parameters in the lab to compensate for distortion and obtain a scaling factor by comparing the number of pixels of a target structure in an image frame with the corresponding actual distances. The rotation matrix and translation vector are assumed to be a unit matrix and zero vector because the only one stationary camera was used in experimental validation.

POI Selection by Masking
Estimating dense optical flow using correspondence in high-resolution input image or video is computationally expensive. Cropping the input image and selecting POIs as preprocessing can provide greater efficiency. The flow is estimated from the POI features within the cropped image. In the KLT method, POIs should be larger than the structural features to extract more feature points for reliable tracking; thus, it is possible to detect points outside structural areas. The proposed POI selection by masking efficiently extracts points for tracking in non-target-based CV applications, where detecting natural feature points or patterns can be challenging. Figure 2 illustrates the proposed masking method, which selects POIs using a binary mask; these points are tracked by the dense optical flow vector calculated using Deepflow, which is explained in Section 3.3. The main advantage of POI selection is to acquire dense POIs on the structure regardless of distinctive features, patterns, or textures, which is very challenging when using the KLT method. provide greater efficiency. The flow is estimated from the POI features within the cropped image. In the KLT method, POIs should be larger than the structural features to extract more feature points for reliable tracking; thus, it is possible to detect points outside structural areas. The proposed POI selection by masking efficiently extracts points for tracking in non-target-based CV applications, where detecting natural feature points or patterns can be challenging. Figure 2 illustrates the proposed masking method, which selects POIs using a binary mask; these points are tracked by the dense optical flow vector calculated using Deepflow, which is explained in Section 3.3. The main advantage of POI selection is to acquire dense POIs on the structure regardless of distinctive features, patterns, or textures, which is very challenging when using the KLT method.

Deepmatching
Deepmatching computes dense correspondences between reference image and the target image. The matching algorithm is based on a multilayered architecture, similar to deep convolutional networks (see Figure 3). Deepmatching splits the image at the i th frame into nonoverlapping 4 × 4 pixel atomic patches and convolves it with the image at the j th frame to obtain a response map for the corresponding image patch. This process is repeated for all patches. In the aggregation stage, response maps are max-pooled with a 3 × 3 filter and downsampled by a factor of two to reduce computational complexity. Then, average pooling is implemented for preprocessed response maps that are extracted from four neighboring patches. The final aggregation process is nonlinear filtering, which avoids fast convergence. Through aggregation, a virtual response map for 8 × 8, 16 × 16, and 32 × 32 patches is constructed; the procedure is iterated to acquire a multiscale pyramid. Note that the pyramid is built using a bottom-up approach, whereas extracting corresponding matches uses a top-down method by extracting scale-space local maxima and backtracking the configuration to obtain quasi-dense correspondences.

Deepflow
Deepflow is a variational optical flow that combines color and gradient constraints with a global

Deepmatching
Deepmatching computes dense correspondences between reference image and the target image. The matching algorithm is based on a multilayered architecture, similar to deep convolutional networks (see Figure 3). Deepmatching splits the image at the i th frame into nonoverlapping 4 × 4 pixel atomic patches and convolves it with the image at the j th frame to obtain a response map for the corresponding image patch. This process is repeated for all patches. In the aggregation stage, response maps are max-pooled with a 3 × 3 filter and downsampled by a factor of two to reduce computational complexity. Then, average pooling is implemented for preprocessed response maps that are extracted from four neighboring patches. The final aggregation process is nonlinear filtering, which avoids fast convergence. Through aggregation, a virtual response map for 8 × 8, 16 × 16, and 32 × 32 patches is constructed; the procedure is iterated to acquire a multiscale pyramid. Note that the pyramid is built using a bottom-up approach, whereas extracting corresponding matches uses a top-down method by extracting scale-space local maxima and backtracking the configuration to obtain quasi-dense correspondences. provide greater efficiency. The flow is estimated from the POI features within the cropped image. In the KLT method, POIs should be larger than the structural features to extract more feature points for reliable tracking; thus, it is possible to detect points outside structural areas. The proposed POI selection by masking efficiently extracts points for tracking in non-target-based CV applications, where detecting natural feature points or patterns can be challenging. Figure 2 illustrates the proposed masking method, which selects POIs using a binary mask; these points are tracked by the dense optical flow vector calculated using Deepflow, which is explained in Section 3.3. The main advantage of POI selection is to acquire dense POIs on the structure regardless of distinctive features, patterns, or textures, which is very challenging when using the KLT method.

Deepmatching
Deepmatching computes dense correspondences between reference image and the target image. The matching algorithm is based on a multilayered architecture, similar to deep convolutional networks (see Figure 3). Deepmatching splits the image at the i th frame into nonoverlapping 4 × 4 pixel atomic patches and convolves it with the image at the j th frame to obtain a response map for the corresponding image patch. This process is repeated for all patches. In the aggregation stage, response maps are max-pooled with a 3 × 3 filter and downsampled by a factor of two to reduce computational complexity. Then, average pooling is implemented for preprocessed response maps that are extracted from four neighboring patches. The final aggregation process is nonlinear filtering, which avoids fast convergence. Through aggregation, a virtual response map for 8 × 8, 16 × 16, and 32 × 32 patches is constructed; the procedure is iterated to acquire a multiscale pyramid. Note that the pyramid is built using a bottom-up approach, whereas extracting corresponding matches uses a top-down method by extracting scale-space local maxima and backtracking the configuration to obtain quasi-dense correspondences.

Deepflow
Deepflow is a variational optical flow that combines color and gradient constraints with a global smoothness over the computed flow field and blends the Deepmatching algorithm into an energy

Deepflow
Deepflow is a variational optical flow that combines color and gradient constraints with a global smoothness over the computed flow field and blends the Deepmatching algorithm into an energy minimization framework. The energy to be optimized is a weighted sum of data term E D , smoothness term E S and matching term E M , expressed as where w = (u, v) T is the optical flow field, x := (x, y) T denotes a point in the image domain Ω, and α, and β are tuning parameters. Data term E M penalizes brightness and gradient constancy assumptions; it is the sum of two terms, balanced by weights δ and γ: where ψ is a robust function that handles occlusions. The smoothness term enforces regularity by penalizing the total variation of the flow field, as The matching term approximates the flow estimation to a precomputed vector field by penalizing the difference between computed vector field W and precomputed vector field W .
where c is a binary term with a value of 1 if a match is possible and φ is a weight term that has a low value if the match is false. An incremental coarse-to-fine warping strategy is employed to solve a nonconvex and nonlinear energy functional for Deepflow.

Reference Frame-Based Displacement Measurement
Reference frame-based displacement measurement takes the initial frame as reference and calculates its optical flow with the current frame to directly obtain a displacement field, without having to integrate the optical flow from two subsequent images as KLT methods do. This approach uses two subsequent images and can be disturbed by occlusion of the camera by obstacles and by the accumulation of tracking error at each subsequent frame, which causes displacement drift. Reference frame-based displacement measurement calculates the change in the displacement of the input frame associated with the reference frame. Figure 4 shows a reference frame-based measurement, where d m represents the measured displacement between the i th and reference frame at m-th POI.
where w ( , ) T u v = is the optical flow field, x : ( , ) T x y = denotes a point in the image domain Ω, and α, and β are tuning parameters. Data term EM penalizes brightness and gradient constancy assumptions; it is the sum of two terms, balanced by weights δ and γ: where ψ is a robust function that handles occlusions. The smoothness term enforces regularity by penalizing the total variation of the flow field, as The matching term approximates the flow estimation to a precomputed vector field by penalizing the difference between computed vector field W and precomputed vector field Wʹ.
where c is a binary term with a value of 1 if a match is possible and is a weight term that has a low value if the match is false. An incremental coarse-to-fine warping strategy is employed to solve a nonconvex and nonlinear energy functional for Deepflow.

Reference Frame-Based Displacement Measurement
Reference frame-based displacement measurement takes the initial frame as reference and calculates its optical flow with the current frame to directly obtain a displacement field, without having to integrate the optical flow from two subsequent images as KLT methods do. This approach uses two subsequent images and can be disturbed by occlusion of the camera by obstacles and by the accumulation of tracking error at each subsequent frame, which causes displacement drift. Reference frame-based displacement measurement calculates the change in the displacement of the input frame associated with the reference frame. Figure 4 shows a reference frame-based measurement, where dm represents the measured displacement between the i th and reference frame at m-th POI.

Outlier Filtering and Signal Averaging
Deepflow provides time-series displacement for each pixel in a selected ROI. However, the estimated flow may include outliers caused by noise, such as vanishing features or incorrectly matched feature and background points. This section proposes a filtering process for extracting accurate points related to the displacement of a target object.
Let D be a matrix containing offset-removed displacements at POIs, as D = [d1,d2, … dm]. D is first threshold-filtered with its median value to remove points related to a stationary background:

Outlier Filtering and Signal Averaging
Deepflow provides time-series displacement for each pixel in a selected ROI. However, the estimated flow may include outliers caused by noise, such as vanishing features or incorrectly . D is first threshold-filtered with its median value to remove points related to a stationary background: where Df ilter is filtered displacements and n is the number of filtered displacements. To improve measurement accuracy, outlier filtering using a correlation coefficient is adopted. The correlation coefficient matrix R ij between the filtered displacements is where µ i and σ i are the mean and standard deviation of d f i .·MCR = {mcr 1 , mcr 2 , . . . mcr m }, the mean of cross correlation is obtained with m indicating index of points on POI. MCR over defined threshold is selected and corresponding displacements on POI are extracted.

Experimental Setup
An experiment was carried out to validate the proposed non-target-based structural displacement measurement method and for comparison with KLT and laser displacement sensor methods. A subsequent experiment with environmental disturbance was implemented by blocking the camera during measurement to simulate occlusion. An overview of the experimental setup is described in Figure 5. In the experiment, a steel cantilever beam with a height of 1000 mm and a cross-section of 100 mm × 5 mm was used as a testbed. The three major natural frequencies of the beam were 4.8 Hz, 24.4 Hz, and 69 Hz. Video of the beam's motion was taken with a Samsung Galaxy S9+ mobile phone camera at 1 m from the beam using the 4K UHD (60 fps, 3840 × 2160 pixel resolution) setting. The camera was calibrated with 20 images of the checkerboard to obtain intrinsic parameters and lens distortion was corrected. The reference displacement was measured using an ILD-1420 with a 1 kHz sampling rate. A logo was attached to the back of the beam to artificially introduce noise in feature tracking-only this region was cropped for efficient image processing. The scale factor (1 mm/2.8 pixels) was obtained by comparing the beam thickness of 5 mm and the corresponding image pixels.
where Dfilter is filtered displacements and n is the number of filtered displacements. To improve measurement accuracy, outlier filtering using a correlation coefficient is adopted. The correlation coefficient matrix Rij between the filtered displacements is

Experimental Setup
An experiment was carried out to validate the proposed non-target-based structural displacement measurement method and for comparison with KLT and laser displacement sensor methods. A subsequent experiment with environmental disturbance was implemented by blocking the camera during measurement to simulate occlusion. An overview of the experimental setup is described in Figure 5. In the experiment, a steel cantilever beam with a height of 1000 mm and a crosssection of 100 mm × 5 mm was used as a testbed. The three major natural frequencies of the beam were 4.8 Hz, 24.4Hz, and 69 Hz. Video of the beam's motion was taken with a Samsung Galaxy S9+ mobile phone camera at 1 m from the beam using the 4K UHD (60 fps, 3840 × 2160 pixel resolution) setting. The camera was calibrated with 20 images of the checkerboard to obtain intrinsic parameters and lens distortion was corrected. The reference displacement was measured using an ILD-1420 with a 1 kHz sampling rate. A logo was attached to the back of the beam to artificially introduce noise in feature tracking-only this region was cropped for efficient image processing. The scale factor (1 mm/2.8 pixels) was obtained by comparing the beam thickness of 5 mm and the corresponding image pixels.  Figure 6 shows the cropped ROI from an image. From the image, the feature points that are tracked for displacement measurement over frames should be carefully selected for reliable tracking.  Figure 6 shows the cropped ROI from an image. From the image, the feature points that are tracked for displacement measurement over frames should be carefully selected for reliable tracking. The proposed masking-based POI allows selection of any points for tracking, so 77 points inside the structure were chosen for displacement tracking (see Figure 6). In the KLT method, feature points were selected based on feature detection algorithms such as Harris corner and scale-invariant feature transform (SIFT) methods using the gradient of the given image. Compared with the proposed method, the KLT method with Harris corner feature detection only detected ten points that included the edge of the beam and background features, but no features were detected inside the structure because feature point detection is heavily affected by gradient magnitude. The proposed masking-based POI allows selection of any points for tracking, so 77 points inside the structure were chosen for displacement tracking (see Figure 6). In the KLT method, feature points were selected based on feature detection algorithms such as Harris corner and scale-invariant feature transform (SIFT) methods using the gradient of the given image. Compared with the proposed method, the KLT method with Harris corner feature detection only detected ten points that included the edge of the beam and background features, but no features were detected inside the structure because feature point detection is heavily affected by gradient magnitude.  Figure 7 shows that the region inside the structure has a very small gradient magnitude, resulting in no feature detection, whereas edges and backgrounds with strong gradients match points where features were extracted.

Displacement Measurement UNDER Ambient Condition
The KLT and proposed methods were utilized to measure the displacement of the cantilever beam model and compared with reference data measured by a laser displacement sensor. The measured displacements using the KLT and proposed methods are shown in Figure 8. To compare the result of the reference displacement sensor with those of the proposed and KLT method, a thirdorder Butterworth low-pass filter with a cutoff at 30 Hz was applied to the reference displacement Mask region  The proposed masking-based POI allows selection of any points for tracking, so 77 points inside the structure were chosen for displacement tracking (see Figure 6). In the KLT method, feature points were selected based on feature detection algorithms such as Harris corner and scale-invariant feature transform (SIFT) methods using the gradient of the given image. Compared with the proposed method, the KLT method with Harris corner feature detection only detected ten points that included the edge of the beam and background features, but no features were detected inside the structure because feature point detection is heavily affected by gradient magnitude.  Figure 7 shows that the region inside the structure has a very small gradient magnitude, resulting in no feature detection, whereas edges and backgrounds with strong gradients match points where features were extracted.

Displacement Measurement UNDER Ambient Condition
The KLT and proposed methods were utilized to measure the displacement of the cantilever beam model and compared with reference data measured by a laser displacement sensor. The measured displacements using the KLT and proposed methods are shown in Figure 8. To compare the result of the reference displacement sensor with those of the proposed and KLT method, a thirdorder Butterworth low-pass filter with a cutoff at 30 Hz was applied to the reference displacement

Displacement Measurement UNDER Ambient Condition
The KLT and proposed methods were utilized to measure the displacement of the cantilever beam model and compared with reference data measured by a laser displacement sensor. The measured displacements using the KLT and proposed methods are shown in Figure 8. To compare the result of the reference displacement sensor with those of the proposed and KLT method, a third-order Butterworth low-pass filter with a cutoff at 30 Hz was applied to the reference displacement sensor and the measurements were synchronized. The maximum and root-mean-squared errors (RMSE) of the displacements are compared in Table 1. sensor and the measurements were synchronized. The maximum and root-mean-squared errors (RMSE) of the displacements are compared in Table 1.  The proposed method showed a maximum displacement error of 0.43% compared to the reference displacement sensor, whereas the KLT method showed an error of 19.77%. Comparing RMSEs, the proposed method had a very small error (0.07 mm) validating its accuracy in measuring  The proposed method showed a maximum displacement error of 0.43% compared to the reference displacement sensor, whereas the KLT method showed an error of 19.77%. Comparing RMSEs, the proposed method had a very small error (0.07 mm) validating its accuracy in measuring displacements of less than 0.1 mm, but the KLT method showed an error of 0.29 mm. The ratio of the RMSE to the maximum reference displacement was 3.80% for the proposed method and 16% for the KLT method. The KLT method showed relatively lower accuracy because of scaling errors and drift. Because the KLT detects feature points that are determined by feature detection algorithms such as Harris corner and SIFT, detected features are likely to contain background features that do not have structural motion or noisy features that are strongly affected by changes in brightness. The displacements from detected features are simply averaged without filtering, so resulting displacements become smaller than the desired structural displacement due to inclusion of the motionless background. Moreover, detection of noisy features leads to displacement drift as errors accumulate through numerical integration. The scaling error and drift are clearly identified by comparing the frequency domains in Figure 9. The magnitude of the power spectral density (PSD) at the first natural frequency at 4.8 Hz from the proposed method agrees very well with the reference displacement sensor, indicating that the dynamic responses captured and successfully identified the frequency peak. However, the magnitude of the PSD at the first natural frequency from the KLT method was smaller than that from the reference displacement, indicating a smaller displacement measurement. Furthermore, the proposed method had almost the same PSD magnitude in the 0-0.1 Hz region as the reference displacement sensor, whereas the KLT method showed a magnitude as high as the first natural frequency, indicating significant measurement drift in the resulting displacement compared to the reference displacement sensor. displacements of less than 0.1 mm, but the KLT method showed an error of 0.29 mm. The ratio of the RMSE to the maximum reference displacement was 3.80% for the proposed method and 16% for the KLT method. The KLT method showed relatively lower accuracy because of scaling errors and drift. Because the KLT detects feature points that are determined by feature detection algorithms such as Harris corner and SIFT, detected features are likely to contain background features that do not have structural motion or noisy features that are strongly affected by changes in brightness. The displacements from detected features are simply averaged without filtering, so resulting displacements become smaller than the desired structural displacement due to inclusion of the motionless background. Moreover, detection of noisy features leads to displacement drift as errors accumulate through numerical integration. The scaling error and drift are clearly identified by comparing the frequency domains in Figure 9. The magnitude of the power spectral density (PSD) at the first natural frequency at 4.8 Hz from the proposed method agrees very well with the reference displacement sensor, indicating that the dynamic responses captured and successfully identified the frequency peak. However, the magnitude of the PSD at the first natural frequency from the KLT method was smaller than that from the reference displacement, indicating a smaller displacement measurement. Furthermore, the proposed method had almost the same PSD magnitude in the 0-0.1 Hz region as the reference displacement sensor, whereas the KLT method showed a magnitude as high as the first natural frequency, indicating significant measurement drift in the resulting displacement compared to the reference displacement sensor. Computing time for the proposed method with KLT method is compared. The proposed method computes displacement using cropped region of 440 × 320 from original image size (3840 × 2160, 4 K), and KLT method calculate displacement from original image size. All software was run on a PC with an Intel i7-8700 CPU and 32 GB of RAM. The displacement computation took an average of 0.8 s per frame in the proposed method and an average of 0.25 s in the KLT method. Given that the proposed method can provide an average of 1.2 fps, the proposed method can be a good choice if precise nontarget measurement is required.

Displacement Measurement under Disturbed Condition
A second experiment was conducted to determine the robustness of the proposed method to occlusion, for long-term measurements. In the field, vision systems are interfered with by many factors that block the camera's sight, which causes significant measurement errors. To implement occlusion, the camera's view was blocked with A4 white paper for about 1 s and then removed. Figure  10 shows displacement measured by the KLT and proposed methods. In Figure 10b, the KLT method measured very large displacement when occlusion occurred. Since the KLT method tracks feature points that are detected based on the difference between the structure and the surrounding background, errors were caused by mistakenly recognizing some parts of the obstacle as feature points. Additionally, after the camera's view has recovered, the feature points cannot be restored properly, resulting in an offset error. In contrast, Figure 10a shows that the proposed method, which Computing time for the proposed method with KLT method is compared. The proposed method computes displacement using cropped region of 440 × 320 from original image size (3840 × 2160, 4 K), and KLT method calculate displacement from original image size. All software was run on a PC with an Intel i7-8700 CPU and 32 GB of RAM. The displacement computation took an average of 0.8 s per frame in the proposed method and an average of 0.25 s in the KLT method. Given that the proposed method can provide an average of 1.2 fps, the proposed method can be a good choice if precise non-target measurement is required.

Displacement Measurement under Disturbed Condition
A second experiment was conducted to determine the robustness of the proposed method to occlusion, for long-term measurements. In the field, vision systems are interfered with by many factors that block the camera's sight, which causes significant measurement errors. To implement occlusion, the camera's view was blocked with A4 white paper for about 1 s and then removed. Figure 10 shows displacement measured by the KLT and proposed methods. In Figure 10b, the KLT method measured very large displacement when occlusion occurred. Since the KLT method tracks feature points that are detected based on the difference between the structure and the surrounding background, errors were caused by mistakenly recognizing some parts of the obstacle as feature points. Additionally, after the camera's view has recovered, the feature points cannot be restored properly, resulting in an offset error. In contrast, Figure 10a shows that the proposed method, which captures features inside the structure using a masking technique, continuously measured displacement by correctly recovering the feature points.

Conclusions
This paper proposes a non-target-and CV-based structural displacement measurement system using reference frame-based Deepflow, POI selection with masking, and signal filtering and averaging techniques. The proposed method directly measures displacement by calculating optical flow with a reference frame, which is updated to provide a robust tracking result. In addition, as Deepflow allows for pixelwise optical flow calculation, feature points related to structural displacement can be abundantly populated. These feature points are filtered and averaged for accurate displacement measurements while removing background noise. The proposed method was experimentally validated with a cantilever beam and its displacement result was compared with that of a laser displacement sensor. First, the proposed method was compared with KLT in stable conditions; due to some incorrect matches by the KLT method, the proposed method showed a better RMSE with 0.07 mm and 0.29 mm for proposed method and KLT, respectively. Note that the KLT method showed drift over the measurement period because of erroneous feature point detection between the structure and the background. Second, displacement was measured under occluded conditions where the camera was entirely blocked for about 2 s. During blocking, the proposed method tracked drift-free displacement under such abrupt disturbance whereas the KLT method missed or incorrectly detected feature points, resulting in significant drift and offset measurement errors. In conclusion, the ability to measure non-target-specific drift-free displacement was the most significant advantage of the proposed method, which was implemented with Deepflow, masking, and signal filtering and averaging techniques. Future work based on this study will include longterm field experiments and multiple-point tracking for system identification.

Conclusions
This paper proposes a non-target-and CV-based structural displacement measurement system using reference frame-based Deepflow, POI selection with masking, and signal filtering and averaging techniques. The proposed method directly measures displacement by calculating optical flow with a reference frame, which is updated to provide a robust tracking result. In addition, as Deepflow allows for pixelwise optical flow calculation, feature points related to structural displacement can be abundantly populated. These feature points are filtered and averaged for accurate displacement measurements while removing background noise. The proposed method was experimentally validated with a cantilever beam and its displacement result was compared with that of a laser displacement sensor. First, the proposed method was compared with KLT in stable conditions; due to some incorrect matches by the KLT method, the proposed method showed a better RMSE with 0.07 mm and 0.29 mm for proposed method and KLT, respectively. Note that the KLT method showed drift over the measurement period because of erroneous feature point detection between the structure and the background. Second, displacement was measured under occluded conditions where the camera was entirely blocked for about 2 s. During blocking, the proposed method tracked drift-free displacement under such abrupt disturbance whereas the KLT method missed or incorrectly detected feature points, resulting in significant drift and offset measurement errors. In conclusion, the ability to measure non-target-specific drift-free displacement was the most significant advantage of the proposed method, which was implemented with Deepflow, masking, and signal filtering and averaging techniques. Future work based on this study will include long-term field experiments and multiple-point tracking for system identification.

Conflicts of Interest:
The authors declare no conflict of interest.