Automated tracking of gene expression in individual cells and cell compartments

Many intracellular signal transduction processes involve the reversible translocation from the cytoplasm to the nucleus of transcription factors. The advent of fluorescently tagged protein derivatives has revolutionized cell biology, such that it is now possible to follow the location of such protein molecules in individual cells in real time. However, the quantitative analysis of the location of such proteins in microscopic images is very time consuming. We describe CellTracker, a software tool designed for the automated measurement of the cellular location and intensity of fluorescently tagged proteins. CellTracker runs in the MS Windows environment, is freely available (at http://www.dbkgroup.org/celltracker/), and combines automated cell tracking methods with powerful image-processing algorithms that are optimized for these applications. When tested in an application involving the nuclear transcription factor NF-κB, CellTracker is competitive in accuracy with the manual human analysis of such images but is more than 20 times faster, even on a small task where human fatigue is not an issue. This will lead to substantial benefits for time-lapse-based high-content screening.


INTRODUCTION
There is increasing interest, in both cell biology and drug discovery, in knowing both the amount and the spatial distribution of specific proteins in individual cells. Aided by the development of luciferase tags and fluorescent proteins (e.g. Tsien 1998), optical methods can now be used to effect this, leading to a huge increase in cell-based or so-called 'high-content' screening assays (Dove 2003;Abraham et al. 2004;Giuliano et al. 2005;Grånäs et al. 2005;Mitchison 2005;Bailey et al. 2006). Many of these assays are currently performed at fixed time points, whereas it is becoming clear that for systems biology modelling it is important to track cellular functions in single cells over time. A major limitation for this has been the lack of analysis tools for time-lapse imaging in single cells. While the scoring of such image-based assays can (of course) be done manually, the volumes and complexity of the data generated make the development of automated scoring procedures highly desirable.
A number of commercial software systems offer general-purpose image-processing capabilities, while some of the commercial integrated hardware systems, based on automated microscopy, incorporate software that is designed to form part of specific assay kits, often in fixed cells. However, as part of a programme in understanding spatial signal transduction using livecell imaging in single cells (e.g. Nelson et al. 2002aNelson et al. ,b, 2003Nelson et al. , 2004, it became clear that none of these was suitable for our needs, more specifically because living cells are motile and change shape throughout the course of time-lapse experiments. In addition, we wished to have a robust data model that would allow us to store and retrieve the images in a structured, effective, intelligent and systematic manner, much as in the emerging standard for the open microscopy environment (Swedlow et al. 2003;Goldberg et al. 2005).
Carpenter, Sabatini and colleagues have developed open source software for cellular image processing (e.g. Wheeler et al. 2004;Bailey et al. 2006 and www.cellprofiler.org/), but it is not designed for tracking moving (living) cells. We have therefore developed, and here describe, CellTrackeran image-processing environment designed for the analysis of high-content cellular images.

METHODS
Compared with most tracking tasks, CellTracker tracks not only cell positions, but also their boundaries. With cell boundaries available, one may measure gene expression level during dynamical processes as well as other cellular properties, e.g. morphology. There are three types of boundaries in the CellTracker, i.e. cellular, nuclear and user-defined. The user-defined boundaries can be used for tracking cellular compartments other than the cytoplasm and the nucleus. Boundaries are defined as two-dimensional cubic spline curves with the properties given in table 1. Only the control points of boundaries are usually used in the tracking, which also makes boundary editing easy. CellTracker provides management tools for boundaries over a time-series, e.g. copy, edit, delete, rename and conversion. It also includes operators for simple boundary processing activities, such as expansion and contraction.
The menus and logical structure of CellTracker are illustrated in figure 1. CellTracker offers various visualizing tools. An image series may be inspected using various combinations of imaging channels. Boundaries are plotted as overlays upon image. Operations in CellTracker can be classified at the levels of the image, boundary or cell. Image processing and cell tracking are confined to user-selected regions of interest that persist over a sequence of images.

Image processing
Compared with natural images, image intensities inside cells are often not homogeneous and there can be large differences between the cells in a single image. As can be seen from figure 1, a number of image-processing techniques have been included, e.g. image blurring, gradient, edge detection, morphological transforms, local normalization and texture feature operations. Preview windows are usually given to help users to choose proper parameters. CellTracker also provides an image-processing guide under the help menu.

Boundary detection
Since CellTracker uses the fluorescent channels for object detection, the cells themselves may be detected based on image intensities, e.g. via thresholding and level set (Vese & Chan 2002). If an initial boundary is given, the CellTracker may refine the cell boundary based on the detected edges or absolute intensities. The active contour ('snakes') is an algorithm based on edge information (Kass et al. 1987), and is a curve cðrÞZ ½xðrÞ; yðrÞ that moves within an image to minimize the energy function where a, b specify the elasticity and stiffness (respectively) of the active contour. The external energy function E ext is derived from the image so that it takes on its smallest values at the features of interest, such as boundaries. There are quite a few formulations of the active contour. We use the algorithm developed by Xu & Prince (1998), where the external force is defined as the gradient vector flow (GVF) field. The main advantages of GVF active contours are: (i) a longer capture range to guide the contour towards the desired boundary, and (ii) an ability to progress into boundary concavities. The latter is very useful since cell boundaries may have sharp corners. A Voronoi-based segmentation method is also used to find cytoplasmic regions (Jones et al. 2005). Given seeds for a region, the segmentation process is guided by the appearance of the cells. A metric is defined as where F is the image, g is a blurring filter with a small radius and I is the 2!2 identity matrix. l is a regularization parameter, which makes the metric more Euclidean as it increases. Given the metric above, each foreground pixel is assigned to the nearest seed within the manifold defined by the metric. Boundaries between regions are specified where adjacent pixels are assigned to different seeds.
2.3. Boundary tracking 2.3.1. Shape model. In many tracking applications, the object shape is modelled using a two-dimensional planar affine transform ) with only one shape template. In most cases, the nuclear boundaries undergo limited changes, which may also be modelled using the conventional approach. Any allowed shape vector Q can be represented by a shape space W.
Q Z Ws: ð2:3Þ Here, s is a vector for estimating Q within the shape space defined by W. The shape matrix for affine Here, 1 and 0 in equation (2.4) are vectors with 1s and 0s, respectively. Q x and Q y are the x and y coordinates of shape template on the key frames. The first two columns of W govern horizontal and vertical translations, respectively. Q x and Q y are chosen to have their centroids at the origin so that the third and thereafter columns are associated with shape changes only. The nuclear boundary can usually be represented by equation (2.4). In some cases, the shape of the nuclei approximates to an ellipsoid. The tracking parameter s contains the centre, radii and orientation of the ellipse. The affine shape variation is true only for a very limited number of cellular boundaries. They may be better approximated by a linear combination of those in the key frames, and the system thus uses a couple of representative boundaries as templates. A motion of translation and a linear combination of key frames 1 and 2 can be written as If the cell boundaries cannot be represented by a limited number of templates, the Voronoi-based boundary detection method (described in §2.2) may be employed, using the nuclear boundaries as seeds.
2.3.2. Tracking features. Given a contour, the possibility of its being aligned with cellular boundaries may be estimated based on the appearance, edge or colour features. Correlation-based tracking is a traditional approach based on object appearance. It is incapable of tracking boundaries with varying shape and size. In edge-based tracking, an observation is made normal to a set of points chosen to lie on a contour; here, we used evenly spaced points. In the tracking algorithm, we sample a series of random contours in the image. Given edges detected along a normal of the contour, the probability of a sample reflecting a true contour point is estimated as follows : The parameter s is similar to the standard deviation in a normal distribution and is set according to the accuracy of the shape model. The more accurate a shape model, the smaller is s. l is related to the prior probability of the contour point not being detected by edge detection. d m is the distance between the contour and the edge along its mth normal. m is the maximum distance between the contour and edge points under consideration. The probability of a hypothetical contour aligning with the true contour is estimated by multiplying the probabilities of edges along all the normal lines. If there is no edge detected, as we can seen from equation (2.6), f ðd m ; mÞZ m 2 .
In colour-based tracking (Nummiaro et al. 2003), the colour template is characterized by the colour histogram in a region. The similarity between the colour distributions is measured by the Bhattacharyya distance d (see appendix). The observation probability of each sample is specified by a Gaussian with standard variation s (equation (2.7)). More details about colour template tracking can be found in the appendix. For tracking using key frames, we assume that the cell boundaries vary steadily over time. The shape changes between frames, Ds, may be calculated beforehand. A small amount of random shape variation is added as well because the shape change is not necessarily evenly distributed between the key frames.
2.3.4. Particle filter. The objective of tracking is recursive estimation of the boundary position and size s n of the filtering distribution given a series of observations y 1:n Z ðy 1 ; .; y n Þ. In the framework of probability theory, the objective is to estimate the conditional probability of s n , i.e. pðs n jy 1:n Þ. Generally, there are no analytical solutions for the above formulation. The particle filter (Doucet et al. 2001) is a Monte Carlo implementation of general recursions, in which the filtering distribution is represented by samples/particles with associated important weights p N ðs n jy 1:n ÞZ P N iZ1 p ðiÞ n d s ðiÞ n ðds n Þ. More details can be found in , , Doucet et al. (2001) and Shen et al. (2006).

Cell tracking
The term 'cell tracking' here means tracking of both nuclear and cytoplasmic boundaries for each cell. The first step in tracking is initialization. Initialization of the cell boundaries may be generated by the cell detection algorithm described earlier (or may be done manually). The tracking parameters vary according to the combination of tracking algorithms mentioned earlier. A chart of possible combinations available via the interface is given in figure 2. CellTracker provides a wizard to guide users to a reasonable combination. Note that CellTracker provides tracking at the boundary level, and the software provides an interface for tracking selected boundaries using a variety of methods.

Import and export
In CellTracker, a time-lapse image series is imported for tracking. Currently, it supports Carl Zeiss LSM, tiff and Matlab mat files. The LSM file is essentially an extension of the TIFF multiple image stack file format, and it thereby accommodates any number of userdefined channels, e.g. those based on the fluorescence at different wavelengths. CellTracker may export a variety of calculated image data and a video of selected snapshots. The CellTracker software produces cell boundaries for each frame. One can also export cell properties, such as the nuclear and cytoplasmic average intensities, into Microsoft Excel and to an XML file. The XML file, whose schema we give on the website http://dbkgroup.org/celltracker, includes the tracking methods and cell boundaries for each frame, which can be exported into our information management system. With cell boundaries and image data available, users may calculate other properties without using the CellTracker.

EXPERIMENTAL
SK-N-AS cells (Nelson et al. 2004) were plated in 35 mm glass-bottomed tissue-culture dishes (Iwaki, Japan) containing 3 ml of minimal essential medium with Earle's salts (Gibco, UK) plus 10% (v/v) foetal bovine serum (Harlan Seralab, UK) and 1% nonessential amino acids (Gibco, UK). Twenty-four hours post-plating, the cells were co-transfected with p65-DsRed and pEGFP-N1 expression vectors, which produced a red fluorescent p65 (relA) fusion protein and enhanced green fluorescent protein. Twenty-four hours post-transfection, the cells were stimulated with TNF-a and imaged by confocal laser scanning microscopy. The microscopy was carried out using a Zeiss LSM510 confocal microscope equipped with a humidified CO 2 incubator (37 8C, 5% CO 2 ) and a 40!immersion objective (numerical apertureZ1.3). Excitation of enhanced green fluorescent protein was performed using an argon ion laser at 488 nm and the emitted light was detected that was reflected through a 505-550 nm bandpass filter from a 545 nm dichroic mirror. DsRed fluorescence was excited using a green helium-neon laser (543 nm) and was detected through a 545 nm dichroic mirror and a 560 nm longpass filter. Data capture and manual analysis were carried out with LSM510 v. 3.2 software (Zeiss, Germany). The mean fluorescence intensities per pixel of DsRed fusion proteins were calculated for each time point for both nuclei and cytoplasm using the physiology option in the LSM510 v. 3.2 software, from which the nuclear to cytoplasmic (Nuc : Cyto) fluorescence intensity ratios were calculated and plotted using Microsoft Excel.
The automatic tracking has been tested on various computers. The running time given in this paper was based on the results using a portable computer with 1.7 GHz Intel CPU and 1 Gb RAM. Figure 3a shows four typical frames recorded in an NF-kB signalling experiment. The cells touch each other, and the cell positions and boundaries change over time. It is therefore very time consuming to draw the cell boundaries manually. In this example, no nuclear or cytoplasmic dyes were added to aid tracking. However, the signal from the p65-DsRed fusion protein often provides a clear contrast between the nucleus and cytoplasm of the cells, and can be used for tracking via the image edge features. The fluorescent intensities between cells may vary considerably between cells and over time. This makes it difficult to find edge features using fixed parameters. In this example, we first smooth the images using a median filter of size 5, and normalize the images using a local average and its standard deviation. The normalized images of the p65-DsRed channel are shown in figure 3b. The cell fluorescent intensities after normalization are then at the similar levels. Note that the shape of the nuclei approximates to ellipsoids, which may be used as a template. On the other hand, the shape of the cytoplasm is both irregular and highly variable, and it is difficult to define a limited number of shape templates. Using the nuclei as seeds, the segmentation method based on Voronoi can be applied to estimate the cellular boundaries. The results (see figure 3b and    shows the average fluorescent intensity of nuclei and cytoplasm measured using CellTracker, while figure 4b gives the equivalent nucleus : cytoplasm (Nuc : Cyto) ratio. There is a clear oscillation pattern in the profiles (as seen previously, e.g. Nelson et al. 2004). The results obtained by the CellTracker agree with these profiles obtained using manual analysis (figure 4c), but were obtained over 20 times more quickly (approx. 5 versus 120 min). Obviously, due to human fatigue, this ratio will grow substantially, and the automated data outstrip the manual analyses in terms of quality, with increases in the number of images analysed. (The gene expression levels of cell 3 are not shown here because the image intensities saturated at some time points.) In high-content image analysis, it is often the case that populations of cells (often dead or fixed) are analysed as a whole. The cells in this example show clearly that the location of signalling proteins can oscillate in each cell but that they are out of phase with each other when comparing different cells. This causes them to be damped out if they are analysed at the level of the population (cf. Davey & Kell 1996;Nelson et al. 2002aNelson et al. , 2004. The dotted line in figure 4b shows the average Nuc : Cyto ratio of cells 1 and 2. Its oscillation pattern is clearly quite different quantitatively from those of the individual cells, and as the properties of more cells are time averaged, this pattern becomes increasingly blurred. Therefore, it is impossible to establish the mechanisms of signalling that occur in individual cells, and effect their comparison with systems biology-type models, using results from a population. Indeed, based on single-cell analyses, it was reported that the functional consequences of NF-kB signalling may in fact depend not only on the signal amplitude, but also on the number, period and frequency of these oscillations, i.e. their detailed dynamics (Nelson et al. 2004;Kell 2006), underscoring the importance of single-cell measurements.

RESULTS AND DISCUSSION
The above paragraphs have summarized many of the chief properties of CellTracker, but many other features are available and are described in full in the software itself and its manual. To this end, we have made CellTracker available for download via the URL http://dbkgroup.org/celltracker/, together with a variety of files illustrating various features including multiparameter analyses. We believe that it has the most comprehensive facilities available for live-cell tracking, and trust that it may prove useful to the highcontent screening community.
We thank the UK Department of Trade and Industry, under the terms of the Beacon project scheme, and the BBSRC for financial support. We thank various members of the White Lab for their helpful and critical comments during the development of this software.

APPENDIX A. COLOUR-BASED TRACKING
The colour distribution p y Z fp u y g uZ1;.;m of a region R at location y is calculated as where d is the Kronecker delta function and h(x i ) assigns one of the m-bins of the histogram to a given colour at location x i . The variable a provides invariance against scaling of the region. The colour distribution is normalized to ensure that P m uZ1 p u y Z 1. A popular measure for assessing the difference between two distributions is the Bhattacharyya coefficient (Nummiaro et al. 2003). Considering discrete densities, such as twocolour histograms pZ fp u g uZ1;.;m and qZ fq u g uZ1;.;m , the coefficient is defined as rðp; qÞ Z X m uZ1 ffiffiffiffiffiffiffiffiffi ffi p u q u p : ðA 2Þ The larger the r is, the more similar are the distributions. For two identical histograms, we obtain rZ1, indicating a perfect match. The distance between two distributions is measured by Bhattacharyya distance d Z ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1Krðp; qÞ p : ðA 3Þ The quality of the colour-based particle filter can be influenced by the illumination conditions, the visual angle and the camera parameters. To overcome this, we update the target model when image observations are slowly changing. By discarding image outliers where the object is occluded or too noisy, the tracker can be protected against updating the model when the object has been lost. Thus, we use the update rule p E½s O p T where p E½s is the observation probability in terms of the mean state E [s] and p T is a threshold. The update of the target model is implemented by the equation for each bin u, where a weighs the contribution of the mean state histogram p E½s t to the target model q tK1 . Thus, we invoke a 'forgetting' process, in the sense that the contribution of a specific frame decreases exponentially the further it lies in the past.