Integrating Embryonic Development and Evolutionary History to Characterize Tentacle-Specific Cell Types in a Ctenophore

Abstract The origin of novel traits can promote expansion into new niches and drive speciation. Ctenophores (comb jellies) are unified by their possession of a novel cell type: the colloblast, an adhesive cell found only in the tentacles. Although colloblast-laden tentacles are fundamental for prey capture among ctenophores, some species have tentacles lacking colloblasts and others have lost their tentacles completely. We used transcriptomes from 36 ctenophore species to identify gene losses that occurred specifically in lineages lacking colloblasts and tentacles. We cross-referenced these colloblast- and tentacle-specific candidate genes with temporal RNA-Seq during embryogenesis in Mnemiopsis leidyi and found that both sets of candidates are preferentially expressed during tentacle morphogenesis. We also demonstrate significant upregulation of candidates from both data sets in the tentacle bulb of adults. Both sets of candidates were enriched for an N-terminal signal peptide and protein domains associated with secretion; among tentacle candidates we also identified orthologs of cnidarian toxin proteins, presenting tantalizing evidence that ctenophore tentacles may secrete toxins along with their adhesive. Finally, using cell lineage tracing, we demonstrate that colloblasts and neurons share a common progenitor, suggesting the evolution of colloblasts involved co-option of a neurosecretory gene regulatory network. Together these data offer an initial glimpse into the genetic architecture underlying ctenophore cell-type diversity.


Background
Insight into how novelty is generated is important for understanding the origin and diversification of multicellular life. An outstanding challenge, however, is finding a model for which the direction of evolutionary change is known and the novelty of interest is easy to characterize. Ctenophores (comb jellies) are gelatinous marine invertebrates that diverged from the rest of animals over 800 Ma (Dohrmann and Worheide 2017); although their phylogenetic position remains contentious (Dunn et al. 2008;Ryan et al. 2013;Moroz et al. 2014;Borowiec et al. 2015;Simion et al. 2017;Whelan et al. 2017), they are clearly among the first lineages to diverge from the rest of animals. While they share several anatomical features in common with bilaterian animals (e.g., neurons and muscle cells), ctenophores are defined by two novel traits: parallel rows of cilia organized into "combs," and colloblasts, the adhesive cells used to capture prey ( fig. 1). Found exclusively in the tentacles, colloblasts are typified by a crown of adhesivefilled secretory vesicles and an extensible basal apparatus (Eeckhaut et al. 1997). Upon contact with prey, the apical membrane of the colloblast ruptures, releasing the adhesive (Franc 1978). Their association with tentacles and their specialized role in prey capture have led some to propose that colloblasts are the functional analogs of the cnidarian cnidocyte (stinging cell; Alie et al. 2011;Borisenko and Ereskovsky 2013).
The tentacles of ctenophores are composed of a central axis of muscle and nerve fibers embedded in a gelatinous extracellular layer (the mesoglea) surrounded by a monolayer of epidermal cells. In many (but not all) species of ctenophore, the tentacles are adorned by numerous side branches (tentilla) and in some ctenophores (e.g., Euplokamis), these side branches are extensible and prehensile (Mackie et al. 1988). During feeding, the tentacles and tentilla (when present) are extended or uncoiled into the water column to ensnare passing prey (Mackie et al. 1988;Emson and Whitfield 1991). While colloblasts have been described as the predominant cell type of the tentacle/tentillum epidermis, several other cell types are known to populate these tissues ( fig. 1G): covering cells, also known as cap cells or support cells, two types of sensory neurons (ciliated sensory cells and hoplocytes/peg cells), and two types of gland cells (mucus-secreting and granular gland cells ;Horridge 1965;Emson and Whitfield 1991;Eeckhaut et al. 1997;Borisenko and Ereskovsky 2013;Carre and Carre 1989).
The feeding behaviors of ctenophores are diverse but typically involve entangling prey in the extended tentacles or trapping prey with the oral lobes (Haddock 2007). One group of ctenophores (genus Haeckelia) has tentacles devoid of colloblasts; instead, their tentacles are populated by cnidocytes sequestered from their cnidarian prey (Carre and Carre 1980;Mills and Miller 1984). Lacking tentacles completely, ctenophores in the genus Beroe (the sister group to Haeckelia; Podar et al. 2001;Simion et al. 2015) engulf their prey (other ctenophores) with expanded lips and remove chunks of tissue using "teeth" made from modified cilia (Tamm 1983;Haddock 2007). Many species of lobate ctenophore (e.g., Mnemiopsis leidyi, Bolinopsis infundibulum) undergo ontogenetic change in their behavior, relying on the use of tentacles in the juvenile stage and oral lobes as adults. In these taxa, the adult tentacles are short and become restricted to an oral fringe following metamorphosis. In contrast, the adult Beroe develops directly from an atentaculate larva. Thus, whereas the gene regulatory network underlying the development of tentacles may be downregulated in the adult stage of many lobate species, this network may not function at any stage in beroids.
Although they are a clear example of an evolutionary novelty, little is known about the origin of colloblasts. In this study, we leveraged the evolutionary history of ctenophores (including phylogeny, genes loss, and trait loss) to identify genes specific to this novel cell type. We hypothesized that some of the genes associated with colloblast development would have been lost during the diversification of Beroe and Haeckelia from their colloblast-bearing ancestor. Likewise, we Tentacle-Specific Cell Types . doi:10.1093/molbev/msy171 MBE hypothesized that tentacle genes would have been lost in the stem lineage of Beroe. Using comparative transcriptomics, we searched for genes that were present in most ctenophores but were absent from lineages that lack colloblasts and tentacles. We tested the hypothesis that these were trait-specific genes by examining their expression during tentacle morphogenesis in M. leidyi using fine-scale temporal RNA-Seq. We further validated these results using adult tissue-specific and cell-specific RNA-Seq data sets. Using this approach, we report the first genetic characterization of the colloblasts, a truly novel and poorly understood cell type.

Colloblasts Were Secondarily Lost from Beroe þ Haeckelia
We assembled a species tree using 18S sequences from 36 species of ctenophore ( fig. 2). Our tree is congruent with previous reports of relationships among clades within Ctenophora (Podar et al. 2001;Simion et al. 2015;Whelan et al. 2017) and supports both the monophyly of the Beroe þ Haeckelia clade and the position of this clade within the larger clade of colloblast-bearing lineages. This topology confirms that lack of colloblasts and lack of tentacles are derived traits.

Identifying Colloblast and Tentacle Candidate Genes
To identify colloblast and tentacle candidate genes, we searched for genes that were missing from taxa lacking these traits. To do this, we sequenced and assembled transcriptomes from the same 36 taxa, including three species of Beroe and two species of Haeckelia. In most cases, transcriptomes were generated from adult animals; for M. leidyi and Beroe ovata transcriptomes were assembled from a combination of adults, embryos, and larvae. Using OrthoFinder (Emms and Kelly 2015), we generated 13,483 groups of orthologous genes, of which 189 contained representatives from at least 70% of all ctenophore taxa (including M. leidyi) but lacked Beroe and Haeckelia. Hereafter we refer to these as "colloblast candidate genes" (fig. 3A). Likewise, 165 groups contained orthologs from 70% of the taxa, including M. leidyi and at least one species of Haeckelia, but lacked Beroe ("tentacle candidate genes"). We confirmed that both sets of candidate genes were absent from the transcriptome and also the genome of B. ovata (European Nucleotide Archive accession number PRJEB23672).
We hypothesized that colloblast-and tentacle-specific genes would be expressed during or after the onset of tentacle outgrowth (Martindale 1986;Alie et al. 2011). To test this, we examined gene expression during the first 20 h of development in M. leidyi using an RNA-Seq time course ( fig. 3B). After removing genes with no expression (7/189 colloblast genes and 10/165 tentacle genes), we found that 66% (120/182 genes) of the expressed colloblast candidates and 56% (87/ 155 genes) of the expressed tentacle candidates had higher abundance during tentacle morphogenesis (12-20 h post fertilization, hpf) than during early development (0-9 hpf; fig. 3C). We compared this to the number of M. leidyi protein models (ML2.2; https://research.nhgri.nih.gov/mnemiopsis/; last accessed September 10, 2018) which were expressed during this time course (12,646/16,548 models) and found that only 37% of the protein models (4,691/12,646 models)  Babonis et al. . doi:10.1093/molbev/msy171 MBE exhibited higher expression during tentacle morphogenesis. Using a random sampling approach (see Materials and Methods), we found that both sets of candidate genes were significantly enriched for late-expressed genes (P < 0.0001 for each).
Next, we used quality threshold (QT) clustering (Heyer et al. 1999) to group candidate genes with similar expression patterns. Among colloblast candidates, the two largest clusters consisted of 27 and 11 genes ( fig. 3D). The cluster containing 27 genes was characterized by a peak in expression at 11 hpf followed by a second peak at 14 hpf whereas genes in the cluster containing 11 genes first peaked at 14 hpf with a second peak at 18 hpf. The two largest clusters of tentacle candidates consisted of 18 and 6 genes ( fig. 3E). Both clusters exhibited an early peak at 11 hpf followed by peaks at 14 hpf, 16 hpf, and 18 hpf. (Accession numbers for clustered genes are provided in supplementary file 1, Supplementary Material online.) We further validated the colloblast and tentacle candidates by examining their expression in two adult tissues: tentacle bulbs and comb rows ( fig. 4A). Over 70% of the candidate genes were also expressed in the adult tissues we sampled (N ¼ 138/189 colloblast candidates, N ¼ 130/165 tentacle candidates). Using differential expression analysis, we found that 33% of the colloblast candidates (N ¼ 46/ 138) and 20% of the tentacle candidates (N ¼ 26/130) were significantly upregulated in the tentacle bulb compared with the comb row ( fig. 4B). Both sets of candidates were significantly enriched for tentacle bulb expression, compared with randomly selected data sets (P < 0.0001 for both).
Using a published data set reporting differential expression of genes across individual cell types in M. leidyi (Sebe-Pedros, Chomsky, et al. 2018), we found significant clustering of colloblast candidate genes (N ¼ 18) in a single cell (C52; P < 0.0001) and another large cluster (N ¼ 11) of colloblast candidates in a second cell (C53; fig. 4C). We also found significant clustering of tentacle candidates (N ¼ 12) in a third cell (C54, P ¼ 0.0015). The remaining expressed candidate genes were distributed across the other cell types, none of which had a cluster of more than four candidate genes. Cells C52, C53, and C54 were undescribed by Sebe-Pedros, Chomsky, et al. (2018); however, based on the significant overrepresentation of candidate genes in these cells, we suggest that C52 and C53 are colloblasts and C54 is another tentacle-specific cell type.
To characterize these putative colloblast and tentacle cell types further, we first searched both sets of candidates and all genes expressed in cells C52, C53, and C54 for transcription factors that have been previously characterized in M. leidyi (Pang and Martindale 2008;Jackson et al. 2010;Pang et al. 2010;Yamada et al. 2010;Pang et al. 2011;Reitzel et al. 2011;Schnitzler et al. 2012;Simmons et al. 2012;Schnitzler et al. 2014;Reitzel et al. 2016). Additionally, we performed reciprocal BLAST of these data sets against the human proteome and annotated the results of both searches using Gene Ontology (GO). To identify transcription factors, we searched specifically for the following GO terms: GO: 0003677-DNA binding; GO: 0003700-DNA binding, transcription factor activity; GO: 0006351-transcription, DNA templated; and GO: 0006355-regulation of transcription, DNA templated. These combined approaches led to the discovery of seventeen transcription factors ( fig. 4D), five of which have been previously studied in M. leidyi: Nuclear Receptor 2 (MlNR2,   (Pang and Martindale 2008;Ryan et al. 2010;Reitzel et al. 2011;Simmons et al. 2012). Four of these (excluding MlPRD10a) have been previously characterized during embryonic development in M. leidyi using in situ hybridization. Whereas MlNR2 is expressed ubiquitously throughout development, MlNR1 is expressed in the tentacle bulb and apical organ, MlBsh is restricted to the tentacle bulb, and MlIslet is restricted to the apical organ ( fig. 4E). Using tissue-specific transcriptomes from adults, we confirmed that MlNR1, MlBsh, MlIslet, and MlPRD10a are all upregulated in the tentacle bulb, relative to the comb rows ( fig. 4F). MlNR2 was expressed in both tentacle bulbs and comb rows but was not differentially expressed.

Characterizing Candidate Genes
Consistent with other studies of metazoan novelties (Johnson and Tsutsui 2011;Babonis et al. 2016), we hypothesized that the set of colloblast candidates would be enriched for novel (ctenophore-specific) genes. To test this, we used a reciprocal BLAST strategy to search candidate genes against a database of animal genomes (fig. 5A); we considered genes that lacked significant hits outside of Ctenophora (E 1e-02) to be ctenophore-specific. Over 40% (79/189) of the colloblast candidates were ctenophore-specific, whereas only 28% (46/165) of the tentacle candidates and 29% (4,766/16,548) of all protein models (ML2.2) were ctenophore-specific ( fig. 5B). Random sampling confirmed that colloblast candidates were significantly enriched for novel genes (P < 0.0001) whereas tentacle candidates were not (P ¼ 0.643).
To evaluate their putative function, we annotated both sets of candidate genes against the InterPro Consortium database using Interproscan (Jones et al. 2014). We found that and comb row (CR) transcriptome sequencing. (B) Over 70% of the colloblast and tentacle candidate genes were expressed in adult tissues (N ¼ 138/189 colloblast candidates, N ¼ 130/165 tentacle candidates). In both data sets, a significant proportion of the expressed genes (P < 0.0001 for both colloblast and tentacle candidates) were upregulated in the tentacle bulb, relative to the comb row (2 log 2 -fold change, padj < 0.05). (C) There was a significant cluster of colloblast candidates in cell C52 (P < 0.0001), with a second cluster in cell C53. Tentacle candidates clustered significantly in cell C54 (P ¼ 0.0015). Cell IDs refer to single-cell sequencing results reported in Sebe-Pedros, Chomsky, et al.  5D). Among colloblast candidates, overrepresented categories were largely associated with secretion/cell membrane recognition (e.g., sushi/ SCR, MACPF, vWD, Ca-EGF, lectin, golgi transport) and enzymes involved in cellular metabolism (e.g., cytochrome P450, glutaredoxin, carbonic anhydrase, nucleoside hydrolase, acetyltransferase). Among tentacle candidates, the largest overrepresented category consisted of enzymes involved in posttranslational modification (i.e., sulfotransferase, thioesterase, phosphatase, mannosyltransferase, glycosyl hydrolase, cyclotransferase).

Searching for Ctenophore Adhesive Proteins
We used BLAST to search candidate genes against a set of known adhesive proteins from other invertebrates (Hennebert et al. 2015). (Sequences provided in supplementary file 2, Supplementary Material online.) Five colloblast candidates and four tentacle candidates had significant hits to adhesive proteins (E 1e-03), yet each of these genes had better hits to other proteins in the Uniprot database (www. uniprot.org; table 1). Next, we compared protein family (Pfam) domains from the known adhesives to domains identified from candidate genes using Interproscan. From the 48 confirmed adhesive proteins, we identified 17 Pfam domains. One domain was shared among all three data sets (von Willebrand factor type D domain, PF00094), one was shared by adhesives and colloblasts only (EGF-like calcium-binding domain, PF07645), and one was shared by adhesives and tentacles (Chitin binding domain, PF01607). Consistent with our BLAST results, neither colloblast-nor tentacle candidates exhibited significant overlap with Pfam domains from known adhesives (colloblasts P ¼ 0.5008, tentacles P ¼ 0.4459).
Given that we did not find strong BLAST support for the homology of candidate genes and proteins from other biological adhesives, we searched instead for features known to be enriched among described adhesive proteins, including: secretion signal peptide, single-pass transmembrane domains, and regions of low sequence complexity (Waite et al. 2005;Endrizzi and Stewart 2009). Using SignalP (Petersen et al. 2011) and TMHMM (Krogh et al. 2001), we found that 28% (53/189) of the colloblast candidates encoded a signal peptide and 36% (68/189) encoded one or more transmembrane     . 6C), although the number of transmembrane domains in these candidate gene sets did not differ significantly from samples drawn randomly from ML2.2 (colloblasts P ¼ 0.5961, tentacles P ¼ 0.4756). Therefore, neither set of candidate genes was enriched for single-pass transmembrane domains. Finally, we assessed sequences in both data sets for regions of low complexity using the program Segmasker (Wootton 1994). Contrary to our expectations based on other biological adhesives, colloblast and tentacle candidates were not enriched for regions of low-complexity (colloblasts P ¼ 0.427, tentacles P ¼ 0.98).
Searching for Ctenophore Toxin Proteins Moss et al. (2001) suggested the possibility that colloblasts or other secretory cells in ctenophores may secrete a toxin. To test this hypothesis, we used BLAST to search both sets of candidate genes against a database of known animal venoms/ toxins, referred to hereafter as "ToxProt" (Jungo et al. 2012). From the colloblast candidates, we identified a single gene (ML263512a) with a significant match in the ToxProt database (E 1e-03); however, this gene had better hits to Uniprot proteins outside of the ToxProt database (table 2). Among tentacle candidates, we identified 12 sequences with significant hits in the ToxProt database, only one of which (ML435831a) had an equivalent/better hit to a protein in the ToxProt database than to any nontoxin proteins in the Uniprot database.
Finally, we searched the suite of genes identified from cells C52, C53, and C54 (Sebe-Pedros, Chomsky, et al. 2018) against the ToxProt database to determine if there was additional support for the secretion of toxins in these putative tentacle cell types. We identified five cells with significant clusters of ToxProt genes: C17 (P ¼ 0.016), C21 (P ¼ 0.036), C25 (P ¼ 0.037), C47 (P ¼ 0.029), and C54 (P ¼ 0.020; supplementary fig. 1, Supplementary Material online). Cell C54 was identified by the authors only by the presence of a protein with an ShK domain, a domain originally identified from sea anemone toxins.

A Common Origin for Colloblasts and Neurons
Although cell fate has been fairly well characterized in M. leidyi (Martindale and Henry 1997;Martindale and Henry 1999;Henry and Martindale 2001), previous studies of cell fate have been performed only up to the 60-cell (pregastrula)  fig. 7A-D) and allowed embryos to develop to the cydippid stage, as previously described. From 28 embryos with individually labeled cells, we recovered seven cydippids (25%) with labeled colloblasts on the side corresponding to the injected micromere. Surprisingly, all seven of these cydippids also exhibited DiI-labeled neurons, either in the floor of the apical organ ( fig. 7E-G) or in the peripheral nerve net ( fig. 7H), suggesting that colloblasts and neurons differentiate from a common progenitor that acquires its identity after gastrulation.

No Common Origin for Colloblasts and Cnidocytes
We tested the hypothesis that colloblasts and cnidocytes share a common evolutionary origin by searching for orthologous genes in these two cell types. Using OrthoFinder, we generated orthology groups using protein models from M. leidyi and the sea anemone Nematostella vectensis. From this analysis, we identified four groups containing at least one candidate gene (colloblast or tentacle) and at least one of the proteins identified as cnidocyte-specific in a recent study using single-cell sequencing from N. vectensis (Sebe-Pedros, Saudemont, et al. 2018 ; table 3).
Two groups included colloblast candidates and cnidocyte genes; the first group contains orthologs of fibrillin, a glycoprotein component of the extracellular matrix, and the second contains orthologs of retinoic acid receptors (RxRs). Upon closer inspection, the colloblast candidate in this latter group turned out to be the previously studied nuclear receptor MlNR2 (Reitzel et al. 2011). We used BLAST to search MlNR2 against B. ovata and confirmed that this gene is missing from both the transcriptome and the genome of B. ovata. The other two groups contained tentacle candidates and cnidocyte genes. The first of these contained orthologs of protein-O-mannosyl transferase 2, an important regulator of protein glycosylation. Genes in the second group share homology with DELTA-alicitoxin, a pore-forming toxin from sea anemones. Compared with data sets sampled randomly from ML2.2, orthology groups containing cnidocyte genes were not significantly enriched for colloblast (P ¼ 0.8519) or tentacle (P ¼ 0.7869) candidates.

Discussion
First described nearly 200 years ago (Eschscholtz 1829), ctenophores remain a poorly understood group of animals. By combining phylogeny, natural variation in morphology, analyses of embryonic and adult gene expression, and detailed sequence annotations, we have identified and characterized genes associated with tentacle-specific cell types. While we recognize the possibility that our data sets may include genes  . 4C), suggests this approach was effective for identifying genes associated with tentacle cell identity.
As part of this work, we have likely uncovered novel components of an undescribed biological adhesive. Consistent with other adhesives, colloblast candidate genes were enriched for domains associated with secretion, membrane recognition, and subcellular protein trafficking ( fig. 5, supplementary file 1, Supplementary Material online). Furthermore, colloblast candidates were enriched for a hydrophobic N-terminal signal peptide ( fig. 6). Signal peptides are important for directing proteins to the vesicles in numerous secretory cell types including cnidocytes (Anderluh et al. 2000), cells from venom glands (Jones et al. 1992), and adhesive-secreting cells from other animal groups (Hennebert et al. 2015). Thus, the genes we identified as colloblast candidates are consistent with the genes expected to be expressed in a cell undergoing synthesis, packaging, and storage of secreted proteins. Surprisingly, we found no BLAST support for the homology of colloblast candidates with other biological adhesive proteins (table 1) and, unlike other biological adhesives, colloblast candidates were not enriched for regions of low-complexity. Combined with the overrepresentation of ctenophorespecific genes among colloblast candidates ( fig. 5), our results suggest that the origin of the colloblast adhesive was largely independent from the evolution of adhesives in other biological systems. Unlike other animal adhesives (e.g., sea star foot protein, mussel byssal threads), the colloblast adhesive must be fast-acting ("instantaneous") but need not be permanent (Flammang et al. 2009) and these constraints may have facilitated the origin of an adhesive with unique properties in the stem ctenophore. Indeed, we suggest that rapid evolution of existing genes (Martin-Duran et al. 2017), resulting in de novo acquisitions of novel peptide motifs may have promoted the origin of the colloblast adhesive.
We further leveraged the secondary loss of tentacles in the genus Beroe to identify compelling candidate genes for future studies of other tentacle specific cell types in ctenophores. Tentacle candidates were enriched for signal peptides as well MBE as enzymes involved in posttranslational protein modification (figs. 5 and 6, supplementary file 1, Supplementary Material online). One intriguing interpretation is that these enzymerich tentacle secretory cells are some type of gland cell engaged in the production and secretion of a ctenophore toxin.
In support of this, we identified one gene from the tentacle candidates (ML435831a) that encodes both a signal peptide and a MACPF domain, and appears to be an ortholog of actinoporin, a pore-forming DELTA-alicitoxin found in sea anemone cnidocytes (table 3; Oshiro et al. 2004;Rachamim et al. 2015). Further supporting the potential role of this tentacle cell type in producing a toxin, we demonstrate significant clustering of tentacle candidates ( fig. 4F) in a single cell (C54) that also expresses the largest number of genes with significant hits in the ToxProt database (supplementary fig. 1, Supplementary Material online). While empirical observations are essential for evaluating the function of this cell type, these results suggest that ctenophores may incapacitate their prey by secretion of pore-forming toxins from a tentacle specific gland cell. A toxin-secreting cell may have provided many ecological benefits, even among taxa lacking colloblasts, which could explain why this cell type may have been retained in Haeckelia.
Notably, both data sets (colloblast and tentacle candidate genes) were largely devoid of transcription factors. Essential for activating and/or repressing the expression of effector genes (e.g., secreted or structural products), transcription factors are known to be highly pleiotropic, regulating gene expression in numerous regulatory networks. Tentacle-Specific Cell Types . doi:10.1093/molbev/msy171 MBE however, the fact that each Sox gene is expressed in additional domains outside of the tentacle bulbs in both species suggests these genes play many roles in the development of ctenophores. Consistent with this, Sox genes were not identified among the colloblast or tentacle candidates and the transcriptome of B. ovata encodes complete orthologs of all six ctenophore Sox genes (supplementary file 3, Supplementary Material online).
Annotation of both the candidate gene data sets and the putative colloblast (C52, C53) and tentacle (C54) cell types published previously (Sebe-Pedros, Chomsky, et al. 2018) enabled us to identify seventeen putative transcription factors that may play a role in patterning tentacle-specific cell types in M. leidyi ( fig. 4D-F). Possible colloblast transcription factors (MlNR1, MlNR2, and MlBsh) are all known to be expressed in the tentacle bulb during tentacle morphogenesis (Pang and Martindale 2008;Reitzel et al. 2011) and we demonstrate significant upregulation of MlNR1 and MlBsh in the adult tentacle bulb as well. The role of the putative toxin cell transcription factors (MlIslet and MlPRD10a) is not as clear. While both are upregulated in the adult tentacle bulb, MlIslet does not appear to be expressed in the tentacle primordia during embryonic development ) and the spatial expression of MlPRD10a has not been characterized . Intriguingly, the genome of B. ovata encodes clear orthologs of MlNR1, MlBsh, and MlIslet, but lacks orthologs of MlNR2 and MlPRD10a. Given that MlNR2 and MlPRD10a were identified as candidate genes from our phylogenetic analysis, we propose that knockdown of these genes in M. leidyi should result in loss of colloblasts and other tentaclespecific secretory cells.
Surprisingly, our data suggest a common embryological origin for colloblasts and neurons, as both cell lineages appear to be the descendants of a single micromere labeled in the late gastrula stage in M. leidyi (fig. 7). Previous cell lineage studies performed at earlier stages of development found that neural and epidermal cells arose from a common precursor (Martindale and Henry 1997;Martindale and Henry 1999). Our results extend these observations, showing that epidermal cells differentiate from this common lineage before the separation of the neuronal and colloblast identities, as only the latter two cell types arose from micromeres labeled at later stages of development. This confirms a closer embryological relationship of the latter two cell types. Assuming that neurons are homologous across ctenophores (Hernandez-Nicaise 1973), these results imply that the loss of colloblasts resulted from disruption of the colloblastspecific branch of this lineage, independent of the segregation of neurons. Considering that ctenophores in the genus Haeckelia have tentacle bulbs and tentacles but lack colloblasts, we further suggest that the loss of colloblasts was independent of the development of the tentacle bulb. Additional studies of cell fate during embryogenesis in ctenophores with and without tentacle bulbs would shed much needed light on the evolution of morphological diversity in this group.
The shared embryological origin of colloblasts and neurons underscores one striking commonality between colloblasts and cnidocytes, as both cell types differentiate from a progenitor cell that also gives rise to neurons (Richards and Rentzsch 2014;Flici et al. 2017). Importantly, however, we found no additional evidence of a shared origin for these two cell types. Indeed, we found that colloblasts and cnidocytes express largely unique suites of genes as only four orthology groups were identified from among the hundreds of colloblast and cnidocyte candidates (table 3). Thus, rather than inferring the origin of some ancestral colloblast/cnidocyte prototype, we suggest that these novel secretory cell types arose independently in each lineage by co-option of a progenitor cell that already had the capacity for regulated cell secretion ( fig. 8). Assuming nervous systems are homologous across animals (Jekely et al. 2015;Ryan and Chiodin 2015), it is likely that this progenitor cell already Table 3. Orthology Groups Containing Colloblast (C) or Tentacle (T) Candidates from M. leidyi and Cnidocyte Genes from N. vectensis.

Set
ML Gene ID NV Gene ID Others in Group  Description   C  ML020113a,  ML056914a,  ML50011a   NVJ_2203  ML282520a, NVJ_108241, NVJ_113453, NVJ_117150, NVJ_117297,  NVJ_119340, NVJ_123710, NVJ_129169, NVJ_137797, NVJ_142234,  NVJ_146869, NVJ_154796, NVJ_157742, NVJ_16432, NVJ_198567,  NVJ_202189, NVJ_208146, NVJ_209642, NVJ_210066, NVJ_223762,  NVJ_224641, NVJ_22881, NVJ_2483, NVJ_3250, NVJ_32913,  NVJ_37776, NVJ_46752, NVJ_48353, NVJ_61301, NVJ_67572,  NVJ_6789, NVJ_70073, NVJ_79239, NVJ_79524, NVJ_80132,  NVJ_80370, NVJ_83827, NVJ_84687, NVJ_87211, NVJ_87454 MBE gave rise to neurons and possibly other secretory cell types in the ancestor to the lineage encompassing ctenophores, cnidarians, and bilaterians (inset A, fig. 8). Studies characterizing the development of the epidermal sensory organs (sensilla) in flies support this explanation for the origin of novel secretory cells in bilaterians as well, since both the neural and secretory cells (thecogen, tormogen, and trichogen cells) underlying the sensilla also differentiate from a common progenitor (inset B; Hartenstein and Posakony 1989). The relationship of specialized animal secretory cells to neurons suggests that there may be some underlying property of "neural" progenitor cells that makes them more likely to give rise to novel cell types. Because of their critical role in cell-cell communication, neurons have a phenotype that enables the packaging, storage, and delayed secretion of their products. It is possible that this pathway is easy to co-opt for other secretory functions, which could explain why multiple independent lineages of novel cell types seem to have evolved from a progenitor giving rise to neurons. Alternatively, cells that secrete a novel product may simply be easy to positively identify as novel cell types, artificially inflating the relationship of neurons to novelty. Considering Sox genes are expressed in the common progenitor of neurons and cnidocytes in cnidarians (Richards and Rentzsch 2014) and in the tentacle bulb of ctenophores (Jager et al. 2008;Schnitzler et al. 2014), we suggest that Sox genes may be good candidates for conferring general secretory cell identity across metazoans. Understanding the origin of other types of secretory cells (e.g., gland cells) in ctenophores and cnidarians and characterizing their developmental relationship to colloblasts/cnidocytes and neurons will be important for further assessing the ubiquity of this relationship between Sox gene expression and secretory cell phenotype.
The candidate genes described here now form the basis of future investigations into the origin, differentiation, and development of colloblasts and other tentacle-specific cell types in ctenophores. Future studies aimed at constructing the regulatory networks underlying ctenophore secretory cells (including neurons, colloblasts, and gland cells) will provide a unique opportunity to simultaneously characterize the poorly understood nervous system of ctenophores and probe the process by which novel secretory cells evolve. Cells with novel functions can be important for facilitating expansion Tentacle-Specific Cell Types . doi:10.1093/molbev/msy171 MBE into new ecological niches, ultimately promoting speciation and diversification. Over evolutionary time, Beroe and Haeckelia have transitioned to prey types (other ctenophores and cnidarians, respectively) that are atypical for ctenophores, suggesting trophic specialization and evolutionary loss of cell types may have facilitated diversification in Ctenophora.

Materials and Methods
Animal Collection, Tissue Processing, and Transcriptome Assembly Most specimens were collected during blue-water dives or using remotely operated-underwater vehicles from a region of the Eastern Central Pacific near the Monterey Bay Aquarium Research Institute (Moss Landing, CA), as described previously (Francis et al. 2015). These samples were snap frozen in liquid nitrogen and sequenced using a pairedend sequencing protocol at the University of Utah on an Illumina HiSeq 2000 platform with 100 amplification cycles. Briefly, read order was randomized and low-quality reads, adapters, and repeats were removed. For efficiency, subsets of reads were used to assemble transcriptomes. Assembly was performed with both Velvet/Oases v1.2.09/0.2.08 (Zerbino and Birney 2008;Schulz et al. 2012) and Trinity r2012-10-05 (Grabherr et al. 2011). Transcripts from both assemblers were combined and redundant sequences were removed using the sequniq utility in the GenomeTools package (Gremme et al. 2013).
For the developmental transcriptome series, adult M. leidyi were collected from the estuary behind the University of Florida's Whitney Laboratory for Marine Bioscience (St. Augustine, FL) and maintained in the dark for 8 h to induce spawning. Zygotes were collected before first cleavage (time 0) and embryos were collected every 30-60 min for the first 20 h of development. Embryos were collected individually (N ¼ 3-6 embryos per time) and snap frozen on dry ice. Samples were prepared and sequenced on an Illumina HiSeq 2500, as described previously (Levin et al. 2016). B. ovata was collected from a public boat ramp on the Intracoastal Waterway in Port Orange, FL. Adults were spawned in the lab following the protocol for M. leidyi and embryos were collected individually (N ¼ 4 per collection time) at 0, 6, 10, and 20 h post fertilization. RNA was extracted from all 16 embryos and from 4 adults and sent to the Genomic Sequencing and Analysis Facility at the University of Texas, Austin, for library preparation and sequencing on an Illumina HiSeq 2500.
For validation of putative colloblast and tentacle genes, we assembled tissue-specific transcriptomes from adult M. leidyi collected from the estuary behind the Whitney Lab. Tissues (tentacle bulbs and comb rows) were freshly isolated from wild caught animals and snap frozen on dry ice. RNA extraction, library preparation, and sequencing were performed by the Interdisciplinary Center for Biotechnology Research at the University of Florida. Three independent replicates of each tissue were sequenced on a single lane of a HiSeq 3000 using a paired-end protocol. Differential expression analysis was performed using DESeq2 v1.20.0 (Love et al. 2014) in R v3.5.0 (R Core Development Team, 2008). Transcripts with 2 log 2 -fold change and an adjusted P-value 0.05 were considered differentially expressed. Raw sequence data have been deposited in the European Nucleotide Archive (accession PRJEB28334).

Identification and Annotation of Candidate Genes
To identify genes that had been lost in the lineage of ctenophores lacking colloblasts (Beroe þ Haeckelia), we first created orthologous gene groups for the complete transcriptomes of all 36 species of ctenophore using OrthoFinder v1.1.8 (Emms and Kelly 2015). Colloblast candidates were genes present in orthology groups containing 70% of the taxa (including M. leidyi) but were missing from all three species of Beroe (B. ovata, B. forskalii, and B. abyssicola) and from both species of Haeckelia (H. rubra and H. beehleri). Requiring these genes to be present in at least 70% of the transcriptomes (rather than 100%) allowed us to account for stochasticity in gene expression (i.e., genes not expressed at the time the animal was collected) and for gene losses that did not affect the maintenance of colloblasts. The number of candidate genes we recover varies considerably when we allow this cutoff to range from 50% to 100%, but there was no clear choice for the single best proportion to use (supplementary fig. 2, Supplementary Material online). We arbitrarily chose 70% but FASTA files of candidate genes recovered for all other cutoffs are provided in the GitHub Repository for this publication: https://github. com/josephryan/2018-Babonis_et_al_Ryan. Tentacle candidates were present in orthology groups containing 70% of the taxa (including M. leidyi and at least one species of Haeckelia) but lacking any species of Beroe. Three character states were possible for genes expressed in Haeckelia (see dagger, fig. 3A): present in both species, present in H. rubra only, or present in H. beehleri only. In each case, Haeckelia was counted toward the 70% total required to constitute ubiquitous expression across ctenophores.

Temporal Expression of Candidate Genes in M. leidyi Embryos
We examined the expression of candidate genes during embryonic development in M. leidyi using stage-specific RNA-Seq data (NCBI GEO accessions GSE60478 and GSE111748). First, we removed colloblast and tentacle candidate genes with no expression (N ¼ 7/189 colloblast candidates and N ¼ 10/165 tentacle candidates with TPM ¼ 0 at all time points). We also removed sequences with no expression Babonis et al. . doi:10.1093/molbev/msy171 MBE from the ML2.2 (https://research.nhgri.nih.gov/mnemiopsis/) protein models (N ¼ 3, 902/16, 548 genes with TPM ¼ 0 at all time points). We then compared ratios of late gene expression (after the onset of tentacle morphogenesis; 12-20 hpf) to early gene expression (0-9 hpf). Late-expressed genes were those with a ratio >1. We searched the set of gene models (ML2.2) using the same approach. We used QT clustering (Heyer et al. 1999) to cluster candidate genes with similar expression patterns.

Cell Specific Expression of Candidate Genes
We examined the distribution of candidate genes across individual cells from adult M. leidyi isolated for single cell sequencing by Sebe-Pedros, Chomsky, et al. (2018). Significant clustering of candidate genes in individual cells was assessed with 10,000 random draws of similarly sized data sets.

Presence of Signal Peptides and Transmembrane Domains in Target Genes
We searched candidate genes for signal peptides using SignalP v4.1 (Petersen et al. 2011) and for transmembrane domains using TMHMM v2.0 (Krogh et al. 2001). Generally, the genome of M. leidyi encodes fewer signal peptides and transmembrane domains than does the human genome (supplementary fig. 3, Supplementary Material online); however, this may simply reflect the fact that ctenophore sequences were not included in the training set for the SignalP and TMHMM algorithms. To test if the number of signal peptides and transmembrane domains identified by SignalP and TMHMM in our candidate gene data sets was greater than random chance, we built 10,000 randomly assembled size-matched data sets from ML2.2 (N ¼ 189 for colloblast candidates and N ¼ 165 for tentacle candidates). We then ran SignalP on these random sets to determine how many searches produced more signal peptides and transmembrane domains than our initial search.

Amino Acid Composition and Low Complexity Sequences
We determined the composition of amino acids in the colloblast candidate and tentacle candidate data set and compared them to 10,000 randomly assembled size-matched data sets. To determine if these candidate data sets had high numbers of low-complexity sequence stretches, we used Segmasker v1.0.0 (Wootton 1994) to identify regions of low complexity in these data sets as well as the random data sets.

Sequence Similarity to Known Adhesive-and Toxin-Related Proteins
To identify putative adhesive genes, we used BLASTP v2.5.0 (Altschul et al. 1990) to search candidates against the Uniprot database concatenated with the 48 adhesive proteins reported previously (Hennebert et al. 2015). To identify venoms/toxins, we searched candidates against the ToxProt database, a Uniprot database annotated for known venom/ toxin genes using the Animal Toxin Annotation Project (www.uniprot.org/program/Toxins). The Uniprot database was downloaded on October 13, 2017 and the ToxProt database was downloaded on September 5, 2017. Hits with E 1e-03 were considered significant.

Domain Similarity between Candidates and Known Adhesive-and Toxin-Related Proteins
We used the Interproscan results to test whether our sets of candidate genes disproportionally shared Pfam domains with proteins in the Adhesives and ToxProt databases. Towards this, we compared the number of domains shared by the candidate genes and 10,000 randomly assembled sizematched data sets drawn from each database.

Cnidocyte Orthology Analysis
Using OrthoFinder (as above) we grouped ML2.2 with the complete set of protein models from N. vectensis downloaded from JGI (https://genome.jgi.doe.gov/Nemve1/Nemve1. home.html), and a subset of proteins identified by Sebe-Pedros, Saudemont, et al. (2018) as cnidocyte specific but not found in JGI (www.cnidariangenomes.org/download/ nve.gene_models.vie130208). We then searched for orthology groups containing at least one candidate (colloblast or tentacle) from M. leidyi and at least one cnidocyte-specific protein. We assessed significance by searching for shared orthology groups in 10,000 randomly assembled sizematched groups drawn from ML2.2 and the NVJ database augmented with additional cnidocyte-specific sequences.
Cell Lineage Tracing in M. leidyi Experiments were performed as described previously (Martindale and Henry 1997;Martindale and Henry 1999).
Tentacle-Specific Cell Types . doi:10.1093/molbev/msy171 MBE Briefly, individual micromeres of gastrula stage embryos were microinjected with saturated DiI (DiIC18(3); Molecular Probes, OR, USA) prepared in soybean oil. Embryos were either imaged immediately or reared to the cydippid stage in 0.2 lm filtered seawater at room temperature before imaging.

Statistics and Code Availability
We used a Monte Carlo approach to assess the significance of our observations. Briefly, we randomly selected 10,000 data sets each of size N ¼ 189 genes or size N ¼ 165 genes from M. leidyi gene models (ML2.2) and compared the distribution of these random draws to colloblast and tentacle candidates, respectively. This approach was used to detect enrichment of late-expressed genes, ctenophore-specific genes, GO annotations, shared Pfam domains, signal peptides, transmembrane domains, regions of low sequence complexity, clustering in individual cells, and clustering with cnidocyte orthologs. Scripts and files for these analyses are in the GitHub Repository for this publication: https://github.com/ josephryan/2018-Babonis_et_al_Ryan.

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.