Probing T-cell response by sequence-based probabilistic modeling

With the increasing ability to use high-throughput next-generation sequencing to quantify the diversity of the human T cell receptor (TCR) repertoire, the ability to use TCR sequences to infer antigen-specificity could greatly aid potential diagnostics and therapeutics. Here, we use a machine-learning approach known as Restricted Boltzmann Machine to develop a sequence-based inference approach to identify antigen-specific TCRs. Our approach combines probabilistic models of TCR sequences with clone abundance information to extract TCR sequence motifs central to an antigen-specific response. We use this model to identify patient personalized TCR motifs that respond to individual tumor and infectious disease antigens, and to accurately discriminate specific from non-specific responses. Furthermore, the hidden structure of the model results in an interpretable representation space where TCRs responding to the same antigen cluster, correctly discriminating the response of TCR to different viral epitopes. The model can be used to identify condition specific responding TCRs. We focus on the examples of TCRs reactive to candidate neoantigens and selected epitopes in experiments of stimulated TCR clone expansion.

specic features, such as the biochemical properties of the amino acids that make up the TCR, are 50 central to a productive response. Computational methods can help answer these questions by narrowing 51 down the number of candidate responding TCRs, which accelerates experimental testing, typically a 52 labor-and resource-intensive task. In particular, methods aimed at learning a sequence representation 53 of T-cell response can shed light on the binding mechanisms determining T-cell response specicity 54 at the molecular level. Such identication of sequence motifs is a preliminary step to nding TCR 55 residues involved in antigen recognition. From this point of view, TCR sequence-based approaches can 56 also complement and improve other existing approaches, that are focused either on predicting antigen 57 immunogenicity [9, 10] or on characterizing T-cell expansion [11]. 58 We propose a set of probabilistic-modeling approaches, based on Restricted Boltzmann Machines 59 [1214] and previously dened selection factors [15], to characterize features of responding TCR from 60 sequence data. We apply the method to CD8+ T-cell repertoires stimulated in vitro by patient-specic 61 tumor neoantigens [16] that were predicted by a neoantigen immunogenicity model [9]. We detect a 62 neoantigen specic responses in the peripheral blood of seven long-term survivors of pancreatic ductal 63 adenocarcinoma (PDAC) and compare our predictions to experimentally validated responding TCR. The 64 computational approaches we developed discriminate between specic and non-specic expansion and 65 establish a connection between antigen specicity and sequence motifs in TCRs. The methods are general 66 and we show their usefulness in identifying clusters of TCR responding to specic viral epitopes [17]. 67 Results 68 Dataset structure 69 In the experimental protocol of Balachandran et al. [16], putative immunodominant neoantigens were 70 selected among all short peptides harboring a point mutation identied in patient tumors. The neoantigen 71 choice was guided using a neoantigen tness model [9] that predicts the immunogenic potential of a 72 neoantigen based on sequence homology to known infectious disease-derived antigens, and its relative 73 MHC-I binding anity when compared to the wild-type peptide from which it was derived. Peripheral 74 blood mononuclear cells (PBMCs) of seven patients (labelled here as Pt1, ..., Pt7) were pulsed with 75 the putative immunodominant neoantigens (NA), the model [9] predicted homologous infectious disease-76 derived antigens (cross-reactive antigens, CR), and their corresponding unmutated versions (wild-type 77 antigens, WT). In addition, neoantigens were also found to be enriched in the genetic locus MUC16, 78 a known tumor-specic antigen. As a result, MUC16-derived neoantigens were also tested for an in 79 vitro response in two patients (Pt1 and Pt2). Overall, this dataset provides 23 antigen-specic T-cell 80 repertoires, one for each antigen tested across the seven patients, as summarized in Table 1. 81 The response is monitored in terms of TCR clonal expansion 21 days after stimulation (Fig. 1). 82 Sequence reads of the complementarity-determining region 3, CDR3 (the region in contact with antigens) 83 of the TCR β chain (TRB) were obtained both before and after antigen stimulation. We will refer to 84 these lists of sequence reads as the day 0 and the day 21 datasets respectively, providing snapshots 85 of the TCR composition of patient repertoires at day 0 (before antigen stimulation) and at day 21 (after 86 in vitro stimulation). A clonotype was dened to include all TRBs with identical CDR3 amino acid 87 sequences and we will use this denition. Ref. [16] assessed TRB clonal expansion using ow cytometry 88 and CDR3 sequence reads. They found identical TRB clones that were signicantly expanded in response 89 to both NA and CR in all patients and neoantigen-reactive clones that were also present in the same 90 patient's archival primary tumors in 5 out of 7 patients. 91 Our goal is to learn from CDR3 sequence data sequence-level models capturing information about 92 the response to specic antigens in each T-cell repertoire. We use TRB clonal abundance to train models 93 that identify responding clonotypes. We assign to each CDR3 sequence a multiplicity (counting up the 94 corresponding reads) and we construct sequence datasets where sequences are replicated as many times 95 as given by their multiplicity. In this way, the contribution of each CDR3 sequence is eectively weighted 96 by counts of sequence reads and we will refer to these training datasets as count-weighted datasets (Fig.  97 1B). 98 Probabilistic models inferred from sequence data 99 We learn a distribution of the probability of nding a specic CDR3 sequences in a given repertoire. 100 The statistics describing CDR3 amino acid usage evolve from day 0 to day 21, reecting the underlying 101 evolution of the T-cell repertoire composition induced in the experiment. TRB clonal expansion modies 102 the amino acid sequence statistics (Fig. 1B). The goal of our inference frameworks is to reconstruct from 103 these changes in the CDR3 sequence statistics the constraints (or patterns) at the level of amino acid 104 sequences that characterize antigen induced expansion. We consider three methods: the Restricted 105  (Fig. S1). To learn the RBM, we rst 117 need to reduce sample sequences to the same length by performing a CDR3 sequence alignment [18]. 118 We propose for CDR3s a novel alignment procedure where we rst build an alignment prole of all the 119 sequences of the same length and then progressively re-align proles of increasing length to obtain a 120 multiple sequence alignment of length N σ = 19 amino acids (see Methods, [19]). For sequences that 121 share strongly conserved anchor residues, as in the case for CDR3, the resulting alignment concentrates 122 gaps, due to variable CDR3 length (Fig. 1C), in the middle of the sequence and away from the conserved 123 residues. In Fig. 1B gaps are indicated by`−' symbols in the lists of CDR3 sequences but excluded from 124 logos to emphasize amino acid usage. 125 SONIA. We also consider a recent computational tool, SONIA [15] that infers selection factors acting 126 on the TRB chain from RepSeq datasets. SONIA quanties selection pressures on amino acids in the 127 TCR sequences in terms of position-specic selection or q-factors, by comparing the probability of 128 nding a given sequence in a post-stimulation repertoire of interest compared to a baseline repertoire. 129 In the context of this paper, we model TRB clone expansion as a form of selection pressure imparted by 130 antigen stimulation and learn two SONIA models for the probability of nding a CDR3 sequence, one 131 before and one after antigen stimulation. Specically, SONIA ts the statistics of count-weighted CDR3 132 datasets by an independent-site model, xing the same background distribution for both datasets to 133 the probability of generation of these sequences p gen (Eq. 7 in Methods). We used SONIA models with 134 Left+Right feature encoding [15] (Fig. 2B). Each CDR3 sequence is encoded by a set of features labeled 135 by the distance of each amino acid from the left end of the CDR3 (from the start CDR3 anchor residue) 136 and from the right end (from the end CDR3 anchor residue), up to a maximum distance that we set to 137 N σ = 19. This left-right encoding is consistent with the results of the alignment procedure (see e.g. Fig.   138 2). 139 RBM-Left+Right (RBM-LR). Finally, we also implemented an RBM version which does not 140 rely on a rst step of sequence alignment by performing a Left+Right CDR3 encoding as in the SONIA 141 Left+Right model [15]. We refer to this RBM version as RBM-LR. The Left+Right data encoding is 142 suitable only for biological sequences that display strongly conserved start and end residues, such as 143 CDR3. It has the advantage of not relying on multiple sequence alignments, which are sample-and 144 procedure-dependent. We use the RBM-LR model to show that our conclusions do not depend on 145 data encoding (Fig. S2): since the RBM-LR model does not require any alignment, it avoids producing 146 alignment gap artifacts found in the RBM motifs, however its training time is substantially longer 147 (Methods). 148 We have summarized the characteristics of these 3 inference approaches in Table 2. 149 Model parameters are potential tools for motif discovery 150 The sequence-based approaches in the previous section provide a scoring scheme that associates sequence 151 abundance, which is informative about TRB response, to sequence features, that embed molecular details 152 related to TRB specicity (Fig. 1). In particular, the inferred model parameters identify and allow us to 153 derive sequence motifs containing CDR3 residues characteristic of antigen-specic response (Fig. 2). The 154 RBM biases g i (σ i ) identify conserved positions along the CDR3 that are mainly a result of preferential 155 usage of these amino acids during the generation process. Similarly SONIA q-factors disentangle, in a 156 site-specic way, enrichment in amino acid usage due to clonal expansion from the generated frequency 157 distribution (incorporated in the reference probability p gen Model scores correlate to T-cell clonal expansion 163 From the dataset incorporating clonal multiplicity upon stimulation by antigen P at day 21, the RBM 164 approach and SONIA learn the probability p P 21 (σ) that a given TRB clone σ is detected with high 165 abundance post-stimulation (Fig. 2). 166 As a preliminary step, we asked whether p P 21 (σ) correlates with CDR3 counts at day 21 for sequences 167 in a test set (see Methods). This indicator of performance is extremely poor when we use all the clones, 168 since low-count clones are a source of noise. If we lter out low-count clones, progressively restricting the 169 test set to the clones with highest abundance, we recover a signicant correlation and RBM performance 170 is generally better than the SONIA biophysical model based on the independent-site assumption (Figs. 171 S3 and S4B,E). 172 We next introduced a probabilistic score of response for a clone σ to stimulation by antigen P , 173 S P resp (σ) = log p P 21 (σ)/p 0 (σ) , which measures the ratio of the model probability assigned to sequence 174 σ in the P -specic repertoire after stimulation, p P 21 (σ), relative to the one before stimulation p 0 (σ). We 175 tested the response score's ability to correlate to actual clonal expansion under the same stimulation P , 176 quantied by the clone's fold-change with respect to its abundance at day 0 (see Eq. 1 in Methods). Fig.  177 3 shows good performance, with correlation coecients ranging between 0.68-0.83 (RBM) and 0.51-0.71 178 (SONIA), the only exceptions being visible in datasets Pt7 and Pt4 WT. Here, relatively high counts at 179 day 0 obstruct information about the antigen-specic response at day 21. Response scores S P resp (σ) can 180 also be dened in terms of the TCR probability at day 21 relative to baseline distribution p gen (σ) [20].

181
These scores capture information about the overall selection pressures acting on the sequence, including 182 thymic and antigen specic selection [15] (see Methods). The advantage of this choice is that p gen 183 can be estimated in silico [20], without the need for longitudinal datasets that include sequence data 184 at time points before stimulation, which are not always available. More generally, S P resp (σ) can be 185 estimated relative to other background distributions at our disposal, accounting for TCR statistics in 186 the peripheral blood in normal, unstimulated conditions. In Fig. 3A, we show that the model prediction 187 for antigen specic expansion at day 21, S P resp (σ), is robust with respect to the choice of the background 188 distribution, suggesting we are capturing a response to this specic antigenic challenge. 189 Model scores serve as predictors of specicity of T-cell response 190 The datasets considered portray the response of the same, individual TRB repertoire to dierent condi-191 tions determined by cultures with dierent antigens. We next asked whether the model is able to detect 192 some condition-related specicity in these responses. 193 The model's response scores S P resp (σ) quantify dierential degrees of expansion of a TRB clone σ 194 in response to stimulation by antigen P (Fig. A-B). By comparing response scores to dierent antigens 195 tested for the same patient, we dene a score of specicity of response of clone σ to a given antigen P , S P spec (σ), see Eq. 10 in Methods. The score assigns positive values to TRB clones that are specic responders to the antigen and negative values to the clones unspecic to it, as shown for Pt3 in Fig. 4A. 198 Here, the expansion in response to the two tested antigens (the neoantigen NA, the cross-reactive CR) is 199 mostly specic to each of them. Specicity scores whose values lie around zero denote both clones that 200 did not expand at all and clones that expanded under both stimulations (cross-reactive clones), see Fig.  201 4B. Cross-reactive clones are expected to arise due to the sequence similarity between CR and NA and 202 were documented in Ref [16]. For Pt7, both specic and unspecic responders give similar distributions 203 of specicity scores, due to a less marked dierence in fold change under the two stimulations (Fig. 4A), 204 signaling that in Pt7 expansion is less specic. 205 We can set a threshold score value to discriminate specic from unspecic responders and, by varying 206 this discrimination threshold, we can build a Receiver Operating Characteristic curve (ROC) describing 207 the fraction of specic responders to the antigen predicted by the model's score, against the fraction of 208 predicted unspecic responders. The AUROC (Area Under the ROC) can be taken as a quantitative 209 indicator of the specicity of the expansion upon exposure to a particular antigen at the repertoire level. 210 When assessing specicity, we found that RBM and RBM-LR give consistent answers, see To perform a comparative analysis of these sequence motifs, we applied GLIPH [21,22], an algorithm 232 that clusters TCR sequences based on putative epitope-specicity (Fig. S8). Most expanded clusters 233 recovered by GLIPH can be associated with expanded sequences represented in RBM space in Fig. 5. 234 Beyond these common hits, GLIPH and RBM reveal complementary information, reecting their dier-235 ent purposes. While GLIPH focuses on nding clusters of similar sequences, in our RBM approach we use 236 information on expansion to appropriately weight sequences. As a result, GLIPH reports well-clustered 237 but not necessarily expanded motifs, while RBM reports highly expanded but sometimes isolated se-238 quences. Only sequences that are both well-clustered and expanded are captured by both algorithms. 239 To further illustrate the potential of the RBM low-dimensional representation, we considered human 240 TCR validated to be specic to common viral epitopes by MHC tetramer-sorting [17,21]. We tested the 241 (restricting to expanded clones) or the repertoires specic to pp65 495 , BMLF1 280 from [21], we found that the average model performance is signicantly decreased (Fig. 7D). Since tetramer-sorted datasets do 279 not include count information, this performance reects the ability to predict unseen sequences, which 280 is a more stringent test than predicting the abundance of observed sequences. A similar test can be 281 performed on the PBMC stimulation experiments from [16] by separating TCR sequences into training 282 and testing sets, in such a way that a given sequence could not be found in both. Doing so reveals 283 relatively poor predictability (Fig. S9), which is due to the very high diversity of responding clonotypes 284 in these datasets. In general, we veried that the model's ability to recover specic clones is inversely 285 correlated with diversity ( Fig. 7E and S9A). training data sets in the leave-one-out validation protocol (Fig. S9). SONIA performance is correlated 302 to RBM performance (Fig. S9C), albeit still poor, supporting that the ability to generalize to unseen 303 clonotypes is weakly dependent on the model and data encoding chosen, rather it is largely determined 304 by the heterogeneity of the repertoire. In general, the predictive power of sequence-based models is 305 largely improved when they are trained on less diverse data sets as tetramer data from the M1-specic 306 repertoire from Ref. [17]. 307 The RBM also has the ability to project the response onto dierent (groups of ) clonotypes (Fig. 5), 308 which is useful for disentangling clones specic to multiple antigens within the same sample. Combining 309 dierent datasets describing the response to specic antigens gives hope of linking the specicity of 310 response to distinct TRB sequence features. 311 Any generative approach for the sequence probability could be used, for example VAE models [27,28]. 312 While for tting post-thymic selection TCR distribution methods capturing non-linearities did not seem 313 to be essential (there VAE performs as well as SONIA [29]), for building models of TCR response we 314 expect the VAE to reach an accuracy similar to RBM. 315 Identifying TCR reactivity to an antigen that is a single mutation away from a self-antigen (neoanti-316 gen), but that is not reactive to the self-antigen itself (the corresponding wild-type), is of major impor-317 tance for personalized cancer immunotherapies, such as vaccine design. Moreover, the inferred probabilis-318 tic score renders our models generative, in principle, implying one can predict new TCRs that recognize 319 an antigen, a point critical to the ability to engineer T cells with a given specicity. It would be inter-320 esting to validate, by assays probing the response to neoantigens and wild-type antigens, to what extent 321 our modeling approach is able to distinguish the wildtype-specic receptors from the neoantigen-specic 322 receptors, and whether one can then generate new TCRs with comparable specicity. 323 We have benchmarked the model with datasets describing individual repertoires elicited by dierent 324 known antigens, without using any information about the response-epitope association. Our approach is 325 broadly applicable to scenarios where only single or multiple snapshots of TCR repertoires (and not the 326 stimulating epitopes) are available. In these cases, our method can assess whether TCR-driven immune 327 response is occurring and how specic it is across dierent conditions, time points, and tissues. It could 328 be used to discriminate clones specically responding to dierent, unknown targets or to detect changes 329 in repertoire clone composition at dierent stages of disease progression. In addition, in Figs. 6 and 7B-E 330 we show an application of the RBM based model to datasets of unique, only responding clones identied 331 by direct TCR binding to MHC-tetramer reagents where a non-responding dataset is not available. 332 Errors in the inference procedure can arise due to incorrect estimates of clone abundances in data 333 without unique molecular barcodes [6, 30]. Low-count sequences, lying a few mutations away from 334 responding clones, may represent variants of high-frequency clonotypes arising from sequencing errors 335 and could be discarded by setting thresholds in counts below which clonotypes are ltered out. We used 336 a count-based ltering threshold only when correlating model scores to clonal abundances (Figs. 3, S3). 337 We used a variable threshold parameter, avoiding having to estimate a precise cuto for clone expansion 338 from counts. Such estimation can depend on count sensitivity to experimental conditions and sequencing 339 protocols and it requires to rst infer the expected level of noise from replicates of the experiment [11]. 340 Replicates were not available here, as is often the case for RepSeq data. When replicates are available, the 341 choice of a more systematic count-based ltering procedure can be used using a probability of expanded 342 clonotypes [11]. Re-weighting sequences by their probability of expansion could provide also a robust 343 solution to correct for these biases and constitutes an avenue for future work. 344 In the neoantigen stimulated data, we observed, perhaps unsurprisingly, a lower degree of repertoire 345 focusing around a few, similar TRB clones than in examples of epitope-specic repertoires from MHC-346 tetramer assays for viral antigens (Fig. 7). We cannot exclude that this eect is due to the type of assay, 347 where, dierently from MHC-tetramer assays, responding clones are not isolated by direct binding to the 348 antigen. Only some of the responding clones may be antigen-specic, while others may have expanded in 349 response to other, not directly controlled, culture conditions. Additional experimental tests are necessary 350 to disentangle the heterogeneity of response induced by dierent molecular targets and the heterogeneity 351 of response due to other recognition modes of the same epitope. If the response described here is not 352 antigen specic, it still remains the response detected by clonal expansion and our models describe a 353 condition-specic rather than epitope-specic response. This dierence in interpretation is important The color code gives their specicity score S N A spec in absolute value (note that |S N A spec | = |S CR spec |). High |S N A spec | is assigned only to clones whose expansion was more signicant in one experiment than in the other, while clones that are signicantly expanded in both experiments (highly cross-reactive) are assigned |S N A spec | closer to zero. These clones do not contribute to discriminating the two repertoires.     The main dataset studied consist of TRB CDR3 regions sequenced by the Adaptive Biotechnologies 369 ImmunoSEQ platform from the T-cell assays of Ref. [16]. We discard sequences that are generated by 370 non-productive events, i.e., they do not have the conserved anchor residues delimiting the CDR3 region 371 (Cysteine as the right anchor residue, Phenylalanine or Valine as left one) and they align to pseudo-genes 372 as germline gene choices. We collapse sequences with the same CDR3 composition (including sequences 373 with dierent V-J genes) into the same clonotype and assign, to each clonotype, a multiplicity (number 374 of sequence reads) that sums up multiplicities of all the dierent DNA sequences coding for that same 375 CDR3. The fraction of reads with the same CDR3 but dierent V-J genes was small (in average 0.06). 376

Characterization of TRB clone expansion 377
In the PBMC culture stimulated by antigen P , a CDR3 sequence σ is detected with a multiplicity that 378 we denote by N P t (σ), at time t = 21 days. We dene the fold change F C P (σ) for a sequence σ in the 379 repertoire measuring response to antigen P as: where N 0 (σ) is the multiplicity of sequence σ in the blood sample from the corresponding patient at 381 day 0 (before any antigen stimulation). In Eq. 1, we add to all counts a 1/2 pseudo-count, as a standard 382 procedure to avoid ill-dened fold change with very low counts [31]. 383 To isolate a pool of expanded clones for each stimulation condition, we follow the same criteria as where Γ µ I(σ) = log dh e −Uµ(h)+hI . The input to the hidden unit µ, I µ (σ), coming from the observed 419 sequence σ is given by: We take U µ (h µ ) in the form of a double Rectied Linear Unit (dReLu) potential: a form that gives the model high expressive power, allowing to t higher order correlations in the input 422 data, see [14,32]. All RBM parameters, weights w i,µ (σ i ), local potentials g i (σ i ) and the parameters 423 specifying U µ (h µ ) -here (γ µ,+ , γ µ,− , h + , h − ) -are inferred from data by maximization of the average 424 likelihood L = log p(σ) data of sequence data σ through a variant of stochastic gradient ascent, as described in [19]. In this way, the RBM probability distribution is inferred in such a way as to optimally 426 explain the statistics of our count-weighted training datasets see Figs. S4 E,H for comparison of data 427 and model single-site frequency and pairwise connected correlations. Penalty terms (regularizations) can 428 be introduced in the log-likelihood maximized during training to prevent overtting. Following [14], we 429 use a L 1 -type regularization: where λ 2 1 is the regularization strength.

431
For RBM models learned from count-weighted datasets (Figs. 3-5), we use the performance of the 432 RBM likelihood L = log p P 21 (σ) at recovering the fold-change increase on the validation set (a held-out 433 20% of data) to search for optimal values of regularization (λ SONIA is a tool of probabilistic inference from TRB sequence data, The inferred probability of seeing a 445 given TRB sequence in the studied repertoire is parametrized in terms of an independent-site model: 446 where we take σ to represent only the CDR3 amino acid sequence, of length l, of the TRB chain. A, as the product: where the left (superindex L) and right (superindex R) terms carry information about the position i of 454 the amino acid A from the left and on the position i − N σ − 1 from the right (see Fig. 2B for an example). Scoring T-cell response 474 We train probabilistic models (RBM, SONIA) on count-weighted datasets at t = 0, 21 for each antigen 475 P . For a given probabilistic framework (RBM, SONIA), we denote the probability assigned to sequence 476 σ by the model trained on the repertoire snapshot at time t under stimulation by epitope P as p P t (σ).

477
We dene the response score of a given CDR3 clonotype sequence σ: the model likelihood assigned to sequence σ at day 21 relative to the one at day 0 in the same patient.

479
The term p 0 (σ) can be in principle replaced by the probability provided by other background models. To test the power of a model trained on the NA sample to discriminate responses that are specic to it 489 from the ones that are specic to the CR sample, we calculate a specicity score of response to NA as: 490 If the responses to NA and to CR are specic, we expect S N A spec (σ) > 0 for σ in the set of sequences such 491 that F C N A (σ) > F C CR (σ) and S N A spec (σ) < 0 for σ such that F C CR (σ) > F C N A (σ), as is seen in the 492 histograms of Fig. 4A for Pt3. When, for the same individual, we have samples for several antigens p 493 (Pt1, Pt2, Pt4, Pt6, see Table 1), we take as specicity score of response to a given epitope P :

494
S P spec (σ) = S P resp (σ) − max p S p resp (σ), where the max is taken over the antigens p = P tested for the same individual. In this case, a sequence σ is considered a specic responder to epitope P if max p F C p (σ) = F C P (σ), while it is considered 496 as an unspecic responder to the antigens p = P (see the distribution of S P spec and F C P for the Pt6 497 samples in Fig. A-B). Note that, using Eq. 9 in Eq. 11, S P spec (σ) reduces to log p P 21 (σ) − max p log p p 21 (σ), 498 as we compare repertoires from the same patient and the background p 0 in Eq. 9 is only patient, and 499 not antigen, dependent. Antigen-stimulation experiments can still be compared two by two as in Eq. 10, 500 allowing us to estimate matrices of pairwise AUROC, see Fig. C. 501 One can adjust a threshold above which the specicity score S P spec (σ) recovers clones specically ex-502 panded under the P stimulation. By varying the discrimination threshold, we build a receiver operating 503 characteristic curve (ROC curve) to test the diagnostic ability of the score S P spec (σ) to detect the speci- To validate model predictions, we divide randomly every count-weighted sample into a training set 512 (80%) and a testing set (20%). Model performance at capturing response and its specicity (through the 513 response score in Eq. 9 and the specicity score in Eq. 11) is evaluated on the testing set by measuring 514 the correlation of response scores to clone fold change (Fig. 3) and by the AUROC metric of specicity 515 (Fig. 4). In this type of validation, sequences in the training and testing set are not distinct per se, but 516 the counts reecting expansion for particular sequences are dierent in the two sets. In short, the reads 517 are randomly distributed between the training and testing sets. In the 5-fold leave-one-out validation 518 protocol (Fig. S9), we instead consider samples of unique CDR3 clones and divide them into 5 sets. 519 We use 4 sets for training models (weighting sequences by counts) and 1 set to validate the model's 520 performance. We repeat the model training for the 5 possible training/testing partitions and we consider 521 the average model's performance over these 5 trainings. In this protocol the same sequence cannot be 522 found in dierent sets. The same 5-fold leave-one-out validation protocol was applied also for training 523 and testing RBM models built from pools of responding-only clones, see Repertoire Diversity Index 530 We consider a measure of repertoire diversity that generalizes the Simpson's diversity [24,25] (based 531 on the probability of drawing the same clone in two independent repertoire samples) to include clone 532 similarity instead of identity only. Following Dash et al. [17], such a measure can be dened by weighting 533 each pairwise clone comparison by a smooth, Gaussian-like factor of the inter-clone distance. We consider 534 the pools of only expanded clones for each assay and we look at all possible, ordered, pairwise comparisons 535 between clones of the same pool to estimate: 536

RBM-LR
Weights (Left-Right) Figure S2: Model parameters inferred by the RBM-LR model. Same representation as in Fig. 2A where, by starting with CDR3s in the Left+Right encoding, 2 sets of RBM weights, w L and w R , are inferred. We  Pairwise AUROC Pt 6 Figure S5: RBM prediction of response specicity. A-B: Dierential degree of clone expansion under stimulation P is mapped by the model into dierential response scores S P resp , enabling a model-based assessment of response specicity. Here we consider sequences from sample Pt6 NA1, where clone abundance reects response to NA1. The fold change due to NA1 stimulation, F C N A1 , is on average higher than the fold change measured for all the other antigen P tested in the same patient, F C P (A), suggesting a specic response. This set of NA1-specic responders is on average assigned higher response score by the RBM model trained on the Pt6 NA1 dataset, S N A1 resp , than by models trained on the other Pt6 samples, S P resp , where the same sequences behave as unspecic responders (B). All models are trained on 80% of a given sample, and we show average log fold change log F C P and response scores S P resp (both expressed as log base 10) over the Pt6 NA1 testing set. C: Matrices of RBM pairwise AUROC for Pt1, Pt2, Pt4 and Pt6 (patients for whom more than 2 antigens were tested, see Table   1). The value at each row/column intersection gives the AUROC (estimated through RBM scores) of response specicity between the antigen to which the row and column refer. to each CDR3 clone σ is factorized over CDR3 positions, i.e. p P t (σ) = N σ i=1 p P t,i (σi) and p P t,i (σi) is taken as the frequency of amino acid σi at position i in the P -specic repertoire at time t = 0, 21 (days). PWM performance in terms of correlation between response score S P resp and clone fold change (see Fig. 3) and AUROC of specicity (see Fig. 4) is compared to SONIA (A,C) and RBM (B,D) for all samples from [16].  and testing set, as a function of the dataset size (ltered by counts). A correlation in the testing set higher than zero is recovered only when retaining the most abundant clones: the dataset size chosen for the points in A,C -indicated by the dark red dot -corresponds to ∼ 160 sequences. The trend, markedly dierent for training and testing set, signals overtting that is unavoidable when the response is heterogeneous (as quantied by the diversity index, Fig. 7A). C: Correlation to clone abundance by RBM -same quantity as in A -is compared to the one obtained by SONIA.