Practical Fluorescence Reconstruction Microscopy for High-Content Imaging

Fluorescence reconstruction microscopy (FRM) is an approach where transmitted light images are passed into a convolutional neural network which then outputs predicted epifluorescence images. This approach enables many benefits including reduced phototoxicity, freeing up of fluorescence channels, simplified sample preparation, and the ability to re-process legacy data for new insights. However, current FRM benchmarks are single scores that are difficult to relate to how useful for trustworthy and FRM predicition is. Here, we relate the conventional benchmark to practical and familiar cell biology analyses to demonstrate that FRM should be judged in context. We further demonstrate that it performs remarkably well even with lowermagnification microscopy data, as are often collected in high content imaging. Specifically, we present promising results for nuclei, cell-cell junctions, and fine feature reconstruction; provide experimental design guidelines; and provide the code, sample data, and user manual to enable more widespread adoption of FRM. Introduction 1 Deep learning holds enormous promise for biological microscopy data, and offers 2 especially exciting opportunities for fluorescent feature reconstruction1–5. Here, fluorescence 3 reconstruction microscopy (FRM) takes in a transmitted light image of a biological sample and 4 outputs a series of reconstructed fluorescence images that predict what the sample would look 5 like had it been labeled with a given series of dyes or fluorescently tagged proteins (Fig. 1A-C) 6 2,6–10. FRM works by first training a convolutional neural network (e.g. U-Net) to relate a large set 7 of transmitted light data to corresponding real fluorescence images (the ground truth) for given 8 markers11–13. The network learns by comparing its fluorescence predictions to the ground truth 9 fluorescence data and iterating until it reaches a cut off. Once trained, FRM can be performed 10 on transmitted light data without requiring any additional fluorescence imaging. This is a 11 powerful capability and allows FRM to: reduce phototoxicity; free up fluorescence channels for 12 more complex markers; and enable re-processing of legacy transmitted light data to extract new 13 information. In all cases, FRM data are directly compatible with any standard fluorescence 14 analysis software or workflows (e.g. ImageJ plug-ins). Such capabilities are extremely useful, 15 and FRM may eventually become a standard tool to augment quantitative biological imaging. 16 17 However, a number of challenges limit FRM accessibility to the larger biological community. Key 18 among these is the difficulty in relating the abstract accuracy metrics used to score FRM to the 19 practical value of FRM data for actual, quotidian biological analyses such as cell counting or 20 morphological characterization. To better appreciate this, consider first that the quality of FRM is 21 typically assessed using a single numerical metric (P) such as the Mean-Squared-Error or 22 Pearson’s Correlation Coefficient that typically range from (0,1) or (-1,1), and second that it is 23 practically impossible to actually reach perfection (P = 1). P can be increased closer to 1 either 24 by training with more images, or by using higher resolution magnification (e.g. 40X-100X) to 25 capture finer details. However, increasing P also carries an intrinsic cost in increased wet-lab 26 and computing time. That improving P is expensive and that P cannot be perfect beg the 27 questions of how good is good enough, and good enough for what (Fig. 1D)? For instance, P = 28 0.7 lacks any practical context, and may be quite good enough for a given use case without 29 requiring more work to raise the ‘accuracy’. This is why context is extremely important for FRM 30 and why the work we present here focuses on evaluating practical uses of FRM with respect to 31 given P values. 32 33 Our goal here is not to improve FRM performance, but rather to provide a standardized 34 implementation of it and demonstrate its practical performance for every-day tasks such as 35 nuclear localization and tracking, characterizing cell morphology, cell-cell junction detection and 36 analysis, and re-analyzing legacy data and data collected on different systems (Fig. 1). To 37 further emphasize the use of FRM for routine tasks, we will exclusively focus on those lower 38 magnifications (4X-20X) commonly used in high content imaging and cellular screening in 39 contrast to the focus on higher magnifications in prior studies 9,10. We hope that the included 40 software we developed and the analyses and comparison data we present will help make FRM 41 more approachable to the broader biological community. 42

Deep learning holds enormous promise for biological microscopy data, and offers 2 especially exciting opportunities for fluorescent feature reconstruction [1][2][3][4][5] . Here, fluorescence 3 reconstruction microscopy (FRM) takes in a transmitted light image of a biological sample and 4 outputs a series of reconstructed fluorescence images that predict what the sample would look 5 like had it been labeled with a given series of dyes or fluorescently tagged proteins ( Fig. 1A-C) 6 2, [6][7][8][9][10] . FRM works by first training a convolutional neural network (e.g. U-Net) to relate a large set 7 of transmitted light data to corresponding real fluorescence images (the ground truth) for given 8 markers [11][12][13] . The network learns by comparing its fluorescence predictions to the ground truth 9 fluorescence data and iterating until it reaches a cut off. Once trained, FRM can be performed 10 on transmitted light data without requiring any additional fluorescence imaging. This is a 11 powerful capability and allows FRM to: reduce phototoxicity; free up fluorescence channels for 12 more complex markers; and enable re-processing of legacy transmitted light data to extract new 13 information. In all cases, FRM data are directly compatible with any standard fluorescence 14 analysis software or workflows (e.g. ImageJ plug-ins). Such capabilities are extremely useful, 15 and FRM may eventually become a standard tool to augment quantitative biological imaging. 16 17 However, a number of challenges limit FRM accessibility to the larger biological community. Key 18 among these is the difficulty in relating the abstract accuracy metrics used to score FRM to the 19 practical value of FRM data for actual, quotidian biological analyses such as cell counting or 20 morphological characterization. To better appreciate this, consider first that the quality of FRM is 21 typically assessed using a single numerical metric (P) such as the Mean-Squared-Error or 22 Pearson's Correlation Coefficient that typically range from (0,1) or (-1,1), and second that it is 23 practically impossible to actually reach perfection (P = 1). P can be increased closer to 1 either 24 by training with more images, or by using higher resolution magnification (e.g. 40X-100X) to 25 capture finer details. However, increasing P also carries an intrinsic cost in increased wet-lab 26 and computing time. That improving P is expensive and that P cannot be perfect beg the 27 questions of how good is good enough, and good enough for what ( Fig. 1D)? For instance, P = 28 0.7 lacks any practical context, and may be quite good enough for a given use case without 29 requiring more work to raise the 'accuracy'. This is why context is extremely important for FRM 30 and why the work we present here focuses on evaluating practical uses of FRM with respect to 31 given P values. 32 33 Our goal here is not to improve FRM performance, but rather to provide a standardized 34 implementation of it and demonstrate its practical performance for every-day tasks such as 35 nuclear localization and tracking, characterizing cell morphology, cell-cell junction detection and 36 analysis, and re-analyzing legacy data and data collected on different systems (Fig. 1). To 37 further emphasize the use of FRM for routine tasks, we will exclusively focus on those lower 38 magnifications (4X-20X) commonly used in high content imaging and cellular screening in 39 contrast to the focus on higher magnifications in prior studies 9,10 . We hope that the included 40 software we developed and the analyses and comparison data we present will help make FRM 41 more approachable to the broader biological community. 42 . CC-BY-NC-ND 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint this version posted   47 While early FRM methods used computationally complex and expensive networks that relied on 48 Z-stacks of images to capture 2D reconstruction 9 , more recently this has been adapted to 49 reconstruct 3D image stacks using a modified U-Net architecture 10 . The U-Net itself is 50 commonly used in machine learning approaches because it is a lightweight convolutional neural 51 network (CNN) which readily captures information at multiple spatial scales within an image, 52 thereby preserving reconstruction accuracy while reducing the required number of training 53 samples and training time. U-Nets, and related deep learning approaches, have found broad 54 application to live-cell imaging tasks such as cell phenotype classification, feature 55 segmentation 10,[14][15][16][17][18][19] , and histological stain analysis [20][21][22][23] . 56 57 Our implementation here provides an archetypal U-Net and framework intended for the cell 58 biology community. Briefly, our workflow is as follows. First, we collected multi-channel training 59 images of cultured cells where each image comprised a transmitted light channel and 60 associated fluorescence channels (labeled using genetically encoded reporters or chemical 61 dyes; see Methods). These images were then broken into 256x256 pix 2 sub-images in ImageJ 62 and then input into the network. Such image chopping is necessary for the average user to 63 account for the average RAM and graphics cards available on standard workstations. These 64 data are then passed through the U-Net network to generate trained weights-the pattern 65 recognition side of the network. Here, the transmitted light images serve as input to the network, 66 which is then optimized to minimize the difference between intensity values of the output 67   predicted images and the intensity values from the ground truth corresponding fluorescence  68 images (e.g. Fig. 1). This process can be extended to full time-lapse video fluorescence 69 reconstruction, making it well suited for high-content live imaging (see Movies S1-4). We have 70 provided all of our code, sample raw data, and an extensive user manual (DataSpace, GitHub) 71 to encourage exploration of FRM. 72 73 As our conventional performance metric, we selected the Pearson's Correlation Coefficient 74 (PCC), which is commonly used in cell biology when comparing the co-localization of two or 75 more proteins, and also used in computer vision to assess spatial-intensity when determining 76 image similarity. However, we observed that naively applying the PCC across our whole dataset 77 skewed the results due to the large number of images containing primarily background 78 (common with high content imaging of oddly shaped or low density samples Accuracy Scores for each of these are summarized in Fig. 1D and Table S1, and we will next 93 present case studies from each of these data sets before concluding with a discussion of how 94 'accuracy' relates to visual quality to help researchers design experiments for FRM. 95 96 97 Results. 98

Demonstration of FRM for low-magnification nuclear fluorescence reconstruction and 99
analysis 100 101 One of the most common computational image processing needs for high content 102 imaging is nuclei detection or segmentation, which enables cell counting, time-lapse tracking, 103 and statistical analyses of ensemble distribution and geometry. While a variety of traditional 104 image processing approaches exist to extract nuclei from phase or DIC images, the most 105 reliable and standardized technique by far is using a vital dye (e.g. Hoechst 33342 or DRAQ) to 106 stain the nuclei. However, Hoechst requires cytotoxic UV illumination while DRAQ (far-red 107 fluorescence characteristics) has been linked to cell cycle alterations due to its chemistry 24-26 . 108 Both dyes also exhibit loss of signal over extended time-lapse imaging. Alternately, genetic 109 reporters such as H2B nuclear labels can be engineered into cells (e.g. transfection, viral 110 . CC-BY-NC-ND 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a To validate low-magnification, high-accuracy nuclear FRM, we collected data in both MDCK 117 renal epithelia cells (5X phase contrast) and primary skin keratinocytes (10X phase contrast) 118 while using Hoechst to label nuclei and generate our ground truth training data. Fig. 2A displays  119 the results of our U-Net predictions for these two cases, comparing the ground truth 120 fluorescence (green) with the network predictions (red). These channels are overlaid for direct 121 comparison where the perfect merger of red and green should appear yellow. The Accuracy 122 Score (P) is included for context. 123 124 The network performs visually well in both cases, with P ~ 0.9, but to represent what that means 125 in practice, we quantified disparities in the predictions with respect to nuclear size for geometric 126 accuracy ( Fig. 2B) and centroid error to reflect positional accuracy (Fig. 2C). In both nuclear 127 area cases, the U-Net slightly overpredicts area, likely due to slight noise in the predictions 128 blurring the predicted nuclei and effectively increasing area. However, the distributions from the 129 violin plots are quite similar in structure, and the predictions are well within the usable range for 130 practical cell counting and segmentation. With respect to nuclear centroid localization, mean 131 errors span 2 microns (5X MDCK) to 1 micron (10X KCs). The improvement from 5X to 10X can 132 likely be attributed to the resolution increase in the magnification, but in both cases the errors 133 are quite small and more than sufficient for standard cell counting, nuclei tracking, and neighbor 134 distribution analyses. Whether a higher P would be beneficial would depend on the specific 135 analysis in question-here, the accuracy is more than sufficient. 136 137 As a final demonstration of the utility of low-magnification reconstruction and nuclear tracking, 138 we input legacy data from a 24 hr time-lapse experiment of the growth dynamics of large 139 epithelia (2.5 mm 2 , 5X) and the network output a reconstructed movie of nuclear dynamics ( Fig.  140 2D, and Movie S1) compatible with standard nuclear tracking algorithms (e.g. Trackmate in 141 FIJI). Images were captured every 10 minutes, and previous efforts to perform this experiment 142 using fluorescent imaging of Hoechst resulted in pronounced phototoxicity, hence FRM proved 143 highly effective both as an alternative nuclear labeling approach for large-scale, long-term 144 imaging, and as a means to reprocess pre-existing datasets. (K, K', K'') Sample nuclear predictions for the same growing tissue at 0, 12, and 24 hours of growth, respectively. Input data consists of MDCK WT cells imaged at 5x magnification and montaged; the U-Net was applied in a sliding-window fashion to predict small patches of the image in parallel. The scale bar is 1 mm.
. CC-BY-NC-ND 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint this version posted March 12, 2020. . https://doi.org/10.1101/2020.03.05.979419 doi: bioRxiv preprint

Sharing FRM networks across laboratories and microscope hardware 170
In practice, it is far more convenient to deploy a pre-trained network rather than re-train a 171 network. While this not always possible, if the same cell type and markers are desired at similar 172 magnifications, it would facilitate improved sharing across different labs and further improve 173 access to FRM. The problem here is that different labs typically have different optical platforms, 174 meaning that the microscope, objective, illumination, and camera may all be different, which can 175 prevent accurate reconstruction. However, it should be possible to correct for some of these 176 differences in certain cases. As a simple demonstration, we collected and trained nuclear data 177 using a Zeiss Observer microscope, a 5X phase contrast objective, and a camera with 42 µm 2 178 pixels and then tested that network against the sample imaged using a Nikon Ti2 with a 4X 179 phase contrast objective and a camera with 53 µm 2 pixels. In addition to other optical 180 differences, this reflects a 25% difference in magnification and a 26% difference in pixel area. 181 Naïve testing of the network resulted in extremely poor performance by visual assessment (Fig.  182 3). In an attempt to compensate for this, we proposed an 'Inter-System Correction Factor' that 183 basically attempts to correct for simple differences in optical hardware by rescaling the raw data 184 into something closer to what the U-Net expects. This process flow is visually laid out in Fig where there is a clear performance improvement, suggesting some degree of cross-platform 188 compatibility-perhaps enough to test out a network. That said, it is ideal to collect training data 189 from the system where the primary data will be captured.  A network is trained using data collected at low magnification (5x) on one system (Zeiss), and used to process new images collected on a different system (Nikon) at a similar magnification (4x), after the images have been appropriately rescaled as a function of the Inter-System Correction Factor. Representative ground truth nuclei from MDCK WT cells images on the Nikon system are shown next to the corresponding cross-platform predictions, resulting from both scaled and unscaled input images. Images were contrast adjusted for reproduction by normalizing the histograms and shifting the lower bound of the histogram up by 1/4 of the dynamic range. The scale bar represents 50 μm.
. CC-BY-NC-ND 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a   In practice, high content imaging is inherently a trade-off between throughput and resolution. 225 The more detail we can extract from lower magnification images, the more efficient the imaging 226 and analysis. Here, we demonstrate the practical performance of FRM and a 20X/0.8NA 227 objective to reconstruct fluorescence signatures for several useful sub-cellular markers using 228 HUVEC cells that stably expressed VE-Cadherin:YFP (mCitrine) and were labeled with Hoechst 229 33342 (live nuclear dye) and SiR Actin (infrared live actin dye). Processed timelapse data (see 230 Movie S3) highlights the variation of these fluorescent features given the same input (DIC) 231 image. 232 As a baseline, we characterized prediction accuracy for cell nuclei as the nucleus itself is 233 relatively low resolution, but detection of sub-nuclear features requires higher accuracy. The line profile). We hypothesize this is primarily due to fundamental limitations of DIC imaging and 243 the lack of contrast for intracellular F-actin, but it may also be due to the network overprioritizing 244 cortical filaments and the diffuse cytoplasmic signal. However, in practice we found that these 245 FRM data were useful for general cell body detection and potential segmentation analyses due 246 to the relatively homogeneous reconstructed fluorescence in the cytoplasmic space.

Comparing FRM visual performance to P scores, training set size, and network 270
A key feature of FRM is that its performance can often be increased by collecting more training 271 data, which in turn ought to improve P. However, P will never be perfect, nor is However, Fig. 6A demonstrates that the rate of change in quantitative quality (P) vs. training set 291 size is neither linear nor is it uniform across different biomarkers. Further, the actual predicted 292 images shown in Figs. 6B,C offer further nuance because they demonstrate that training the 293 network against even a single image would be sufficient to capture nuclei for the purposes of 294 tracking or segmentation, while just 6 images would be sufficient to capture cell shape and 295 junctional geometry in epithelia assuming the researcher were willing to perform some simple 296 manipulations such as background subtraction. There is an obvious performance increase for 297 both cadherins when the training set comprises several hundred images, but it is difficult to 298 visually detect a difference between nuclei reconstructed from 6 or 400 images. 299 An alternate way to improve FRM would be to alter the U-Net architecture. Here, we first 300 compared the standard U-Net to a neural network architecture which was essentially two U-Nets 301 stacked end-to-end with additional residual connections. Such an approach has been shown to 302 improve network depth and performance in other applications [31][32][33] . Here, however, we observed 303 no benefit to training a deeper network (see Fig. S2). Further, given the significant temporal and 304 computational cost, we advise against its use for this kind of FRM. Alternately, we explored the 305 role of the loss function, testing our Pearson's-based loss function against the traditional Mean-306 Squared-Error loss function and found no significant difference ( Fig. S3; Methods). Hence, we 307 conclude that our minimal U-Net implementation performs well as a foundation for a variety of 308 daily analysis tasks without requiring significant fine tuning.  Table S1 for each experimental condition. Then, random images representing a fraction of the total training set is used to train a new U-Net from scratch. (C, D) display representative images for the HUVEC 20x dataset and the MDCK 20x dataset, respectively, with predictions shown for various training set sizes. This type of analysis may assist users in collecting enough data for their task-specific quality requirements. All scale bars represent 30 μm.
. CC-BY-NC-ND 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a need not match our own assessments of value and quality. 348 349 As a practical example, compare the FRM performance for E-cadherin (P = 0.73; Fig. 3) and F-350 actin (P = 0.67; Fig. 5). While the accuracy metrics differ by < 10%, the FRM of F-actin only 351 detected peripheral actin cables, otherwise blurring all internal features into a homogeneous 352 signal. Nonetheless, even this plainly 'inaccurate' signal could prove useful for cytoplasmic 353 reconstruction and tracking. In stark contrast, the E-cadherin data was much more visually 354 accurate and also captured key quantitative features of the ground truth such as junctional 355 localization and intensity, and even the subtle intensity gradients representing 3D morphology 356 despite having only a slight improvement in P-values. Yet despite that, a score of 0.73 is far 357 enough from '1' that it is ambiguous in absence of a specific analysis, which is why FRM must 358 be evaluated in the context of a given question or analysis. 359 360 Practical considerations for training on new, low-magnification data 361 We specifically targeted the lower-magnification end of the imaging spectrum to explore how 362 well FRM performed at magnifications more commonly used for high content imaging 363 applications such as timelapse studies of very large cellular colonies or massive screens using 364 multiwell plates. Our data indicate that such magnifications can be effectively combined with 365 FRM for applications spanning nuclear tracking, cell-cell junction analysis, and certain fine-366 structure reconstruction even at just 20X. 367 368 A particular concern for the average user of a complex machine learning process is the size of 369 the dataset required as this can impose potentially strenuous experimental demands. However, 370 our characterization of FRM performance vs. data set size again shows the importance of 371 context as relatively few images are needed to get quite accurate nuclei reconstruction, while a 372 greater number of images are needed for junction reconstruction (Fig. 6). However, we also 373 note that our largest training set size comprised at most 500 camera images at 20X 374 (approximately one six-well plate)-something easily obtained with a standard automated 375 microscope, and still compatible with manual capture. Further, a very common approach in 376 machine learning is to 'augment' an image dataset by performing reflections and rotations on 377 images such that the network perceives each augmented image as a different datapoint, 378 thereby virtually increasing the size of the dataset. We did not perform such augmentation here 379 for the sake of simplicity and transparency, which suggests that significantly smaller datasets, if 380 augmented, could still produce good results. 381 . CC-BY-NC-ND 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a proteins) obviates the need for any manual annotation or pre-processing-often quite time 398 consuming and subjective. This means that an FRM image can be directly incorporated into any 399 existing analysis pipeline intended for fluorescent images, including traditional threshold-based 400 segmentation approaches. Further, more of the original data is preserved in an FRM image, 401 allowing the capture of things such as fluorescence intensity gradients (e.g. Fig. 3), and features 402 that might be lost during traditional binary segmentation. 403 404 Concluding remarks 405 Here, we characterize the value of fluorescence reconstruction microscopy (FRM) for everyday 406 analysis tasks facing researchers working with cell biology. We specifically highlight the need for 407 individual researchers to explore and evaluate FRM in the context of specific research questions 408 rather than accuracy metrics. We also highlight the surprisingly good performance of FRM even 409 with lower magnification imaging or relatively fine structures such as VE-cadherin fingers. 410 Finally, we have made all of our tools and key training datasets publicly available to improve 411 accessibility and provide a starting point for researchers new to FRM to easily explore it for 412 themselves and to eventually build on and improve. 413 Biological) and penicillin/streptomycin. HUVEC endothelial cells stably expressing VE-418 cadherin:mCitrine were cultured using the Lonza endothelial bullet kit with EGM2 media 419 according to the kit instructions. Primary murine keratinocytes were isolated from neonatal mice 420 (courtesy of the Devenport Laboratory, Princeton University) and cultured in custom media 35 . All 421 cell types in culture were maintained at 37°C and 5% CO2 in air. 422

Preparation of training samples 423
We collected training data using 3.5-cm glass-bottomed dishes coated with an appropriate 424 ECM. To coat with ECM, we incubated dishes with 50 µg/mL in PBS of either collagen-IV 425 (MDCK) or fibronectin (HUVEC, primary keratinocytes) for 30 min 37 °C before washing 3 times 426 with DI water and air drying the dishes. 427 In order to contain a variety of conditions within a single plate to ensure a broad training sample, 428 we placed silicone microwells into the dishes as described in [ 36 ] at densities from [1-2x10 6 429 cells/mL] which ultimately allowed for single cells, low density confluent monolayers, and high 430 density confluent monolayers to be captured. Silicone microwells consisted of 3x3 arrays of 9 431 mm 2 microwells into which we added 4 µL of suspended cells in media, allowed them to adhere 432 for 30 min in the incubator (6 hrs for keratinocytes), added media and returned them to the 433 incubator overnight prior to imaging. To further ensure variability, several dishes were also 434 randomly seeded with cells for each cell type. 435

Fluorescent labeling for ground truth data 436
We used the live nuclear dye NucBlue (ThermoFisher; a Hoechst 33342 derivative) with a 1 hr 437 incubation for all nuclear labeling. We used SiR-Actin (Spirochrome) at 10 µM for live F-actin 438 labeling in HUVECs. All other labels were genetically encoded reporters as described. 439

Image Acquisition 440
5X MDCK data was collected on a Zeiss (Observer Z1) inverted fluorescence microscope using 441 a 5X/0.16 phase-contrast objective, an sCMOS camera (Photometrics Prime) and controlled 442 using Slidebook (Intelligent Imaging Innovations, 3i). An automated XY stage, a DAPI filter set, 443 and a metal halide lamp (xCite 120, EXFO) allowed for multipoint phase contrast and 444 fluorescent imaging. 445 All epifluorescence imaging was performed using a Nikon Ti2 automated microscope equipped 446 with a 10X/0.3 phase objective, a 20X/0.75 DIC objective, and a Qi2 sCMOS camera (Nikon 447 Instruments). Time-lapse imaging effectively increased dataset size as long as sufficient time 448 was allowed between frames to avoid overfitting in the U-Net. MDCK data was collected at 20 449 min/frame, while HUVEC and keratinocytes were given 60 min/frame. Standard DAPI, CY5, and 450 YFP filters sets were used. Confocal sections of E-cadherin fluorescence in MDCK cells (Fig. 3)  451 were collected using a Leica SP8 scanning confocal tuned for dsRed excitation/emission. 452 . CC-BY-NC-ND 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint this version posted March 12, 2020. . https://doi.org/10.1101/2020.03.05.979419 doi: bioRxiv preprint All imaging was performed at 37 °C with 5% CO2 and humidity control. Exposures varied, but 453 were tuned to balance histogram performance with phototoxic risk. Data with any visible sign of 454 phototoxicity (blebbing, apoptosis, abnormal dynamics) were excluded entirely from training. 455

Data Pre-Processing and Training 456
Prior to input to the network, raw images were segmented into 256x256 pixel 2 sub-images, 457 ensuring consistent slicing across the transmitted-light image and the corresponding fluorescent 458 image. The images were then normalized by statistics collected across all images in each 459 channel: that is, by subtracting from each image the mean and dividing by the standard 460 deviation. A test-train split was applied, such that a random 20% of the total images were held 461 out to comprise the test set. Additionally, 10% of the training data subset were held out for 462 validation as is standard. 463 The U-Net style architecture shown in Figure 1 was trained using TensorFlow 37 Sample training loss plots are provided (Fig. S4) For the low-magnification experimental conditions, a nuclear area comparison was 497 performed between corresponding ground-truth and predicted images. Initially, both pairs of 498 output nuclear images were segmented independently using standard auto-thresholding, 499 watershedding, and size exclusion (to exclude clusters) in ImageJ/FIJI, and then outliers were 500 removed using the MatLab function rmoutliers(). We additionally report the centroid-centroid 501 displacement values for the same segmented images. The ImageJ/FIJI plugin TrackMate was 502 used to determine displacements between the ground truth and predicted images, as if they 503 were two frames of a video. Standard TrackMate settings were used and outliers were removed 504 using the MatLab function rmoutliers() for reporting. 505 When intensity plots for line slices are reported, a line is selected as an ROI in 506 ImageJ/FIJI, and intensity values are exported for analysis. 507 New large transmitted-light images were processed using a sliding-window technique. 508 We processed a large image by analyzing 256x256 pixel^2 patches of the input image with a 509 stride of 64 pixels in each direction. Additionally, the border of each predicted patch was 510 excluded in the sliding-window process, as features near the patch borders are likely to have 511 lower accuracy (often as a function of cells being cut off). The sliding-window predictions at 512 each pixel were then averaged to produce the final large predicted image. Timelapse movies 513 can be processed on a frame-by-frame basis. If scaling was required as described in Fig. 3, the 514 input was scaled in FIJI and then passed to the network for analysis. 515

Code and Dataset Availability 516
All code used for pre-processing data, training the network, testing a trained model, and 517 applying the model to new images, along with an extensive user manual can be found at: 518 https://github.com/CohenLabPrinceton/Fluorescence-Reconstruction . 519 Additionally, sample datasets for testing purposes, along with pre-trained network weights, are 520 available through our Dataspace server (see GitHub README file). 521

Acknowledgements 522
Special thanks to Gawoon Shim for assistance with HUVEC and keratinocyte data collection. 523 524 . CC-BY-NC-ND 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a were not removed; only intensity thresholding was performed to produce the modified P from 533 the PCC. By filtering the PCC results by an intensity threshold in the fluorescent images, we 534 remove low-scoring background images, which bias our accuracy score on the complete 535 dataset. Visual inspection of the plot reveals the low-scoring images as "bumps" near 0.0.