Closed-loop optimization of chromatography column sizing strategies in biopharmaceutical manufacture

BACKGROUND This paper considers a real-world optimization problem involving the identification of cost-effective equipment sizing strategies for the sequence of chromatography steps employed to purify biopharmaceuticals. Tackling this problem requires solving a combinatorial optimization problem subject to multiple constraints, uncertain parameters, and time-consuming fitness evaluations. RESULTS An industrially-relevant case study is used to illustrate that evolutionary algorithms can identify chromatography sizing strategies with significant improvements in performance criteria related to process cost, time and product waste over the base case. The results demonstrate also that evolutionary algorithms perform best when infeasible solutions are repaired intelligently, the population size is set appropriately, and elitism is combined with a low number of Monte Carlo trials (needed to account for uncertainty). Adopting this setup turns out to be more important for scenarios where less time is available for the purification process. Finally, a data-visualization tool is employed to illustrate how user preferences can be accounted for when it comes to selecting a sizing strategy to be implemented in a real industrial setting. CONCLUSION This work demonstrates that closed-loop evolutionary optimization, when tuned properly and combined with a detailed manufacturing cost model, acts as a powerful decisional tool for the identification of cost-effective purification strategies. © 2013 The Authors. Journal of Chemical Technology & Biotechnology published by John Wiley & Sons Ltd on behalf of Society of Chemical Industry.


NOTATION
x Candidate solution f Objective function to be optimized k Total fixed number of chromatography steps in a manufacturing procress h i Column bed height at chromatography step i (cm) d i Column diameter at chromatography step i (cm) n CYC,i Number of cycles each column at chromatography step i is used for n COL,i Number of columns operating in parallel at chromatography step i V i Total volume of resin available at chromatography step i (L) T i Processing time of chromatography step i (h) M i Mass of product entering chromatography step i (g) DBC i Dynamic binding capacity of the resin used at chromatography step i (g/L) P Annual product output (kg) D Annual demand (kg) m Number of Monte Carlo trials

INTRODUCTION
Monoclonal antibodies (mAbs) represent the fastest growing category of therapeutic biopharmaceutical drugs due to their unique binding specificity to targets. Yet, their manufacture is costly and time-consuming. The manufacturing process for mAbs can be divided into two phases ( Fig. 1): upstream processing (USP) and downstream processing (DSP). In USP, mammalian cells expressing the mAb of interest are cultured in bioreactors. Following cell culture, the broth moves to DSP, where the mAb is recovered, purified and cleared from potential viruses using a variety of operations including a number of chromatography steps, such as affinity or ion-exchange chromatography. Chromatography operations are identified as critical steps in a mAb purification process and can represent a significant proportion of the purification material costs (associated, for example, with the use of expensive affinity resins and large amounts of buffer reagents). While alternatives to traditional column chromatography platforms are emerging, industry practitioners are still reluctant to perform major process changes. 1,2 At the same time, it is important to determine how best to use existing production facilities for mAbs. 3,4 This is particularly challenging given the significant improvements in USP productivities that have been accomplished over the past decade with higher mAb concentrations (titers) being achieved in cell culture. These improvements have not been matched in purification capacities, leading to concerns over purification bottlenecks and the desire to continuously optimize the design and operation of existing chromatography steps. Hence, to efficiently exploit these cell culture improvements, and account for the growing market for therapeutic mAbs, it has become critical to identify cost-effective purification processes. 1,2,5 An approach to tackle these issues, which is also adopted here, is to develop simulation models of mAb manufacturing processes and locate promising chromatography setups using computational methods. For example, Joseph et al. 6 present a simulation model to identify windows of operation for the column diameter, bed height and loading flowrate of a chromatography step using productivity and cost of goods (COG) as performance criteria. A model to find combinations of protein load and loading flowrate that meet yield and throughput constraints has been developed by Chhatre et al. 7 The discrete-event simulation framework proposed by Stonier et al. 8 allows the selection of optimal chromatography column diameters over a range of titers. The methodology used by these authors 6 -8 consists of selecting and evaluating specific values within the full range of variation of the critical parameters. However, such an approach may not be feasible for larger decision spaces as considered here, where the number of possible permutations of options means that they can no longer be examined individually. Recently a set of mathematical programming approaches to address chromatography column sizing and sequencing problems were presented by Liu et al. 9,10 These methods validated the outcomes of the evolutionary algorithms created in this paper for small problem instances but were found to reach limitations when attempting to solve large problem instances under uncertainty with commercial solvers. This drives the need for more efficient combinatorial optimization methods in this domain.
To address this issue, Simaria et al. 11,12 conducted a study on the application of evolutionary algorithms (EAs) to optimize mAb purification processes. EAs represent an example of metaheuristic combinatorial optimization techniques that function by iteratively evolving a population of candidate solutions using search algorithms based on the principles of natural selection. 13 Simaria et al. 11,12 found that EAs are efficient optimizers in this domain but also recognized that their performance could be improved by further tuning of the EAs. This study conducts a more thorough investigation on this topic. Beginning with a more formal definition of the chromatography equipment sizing problem, this work proposes guidelines on how to tune EAs for this problem and demonstrates the usefulness of certain EA properties when it comes to incorporating user preferences and final decision making.
This study investigates the application of EAs for the identification of chromatographycolumnsizingstrategies -defined here by the diameter and bed height of a column, the number of columns used in parallel, and the number of cycles a column is run for -that are cost-effective in terms of COG per gram (COG/g) of product manufactured. This task can be formulated as a combinatorial (single-objective) optimization problem subject to multiple constraints and interacting decision variables, uncertain parameters and expensive fitness evaluations (represented by time-consuming computer simulations).
The type of problem considered here -where solutions are evaluated by simulation rather than computing some function available in algebraic form -is commonly referred to as a closedloop optimization problem. 14 -16 The term closed-loop suggests that the setup in such problems establishes an interactive loop between an optimizer and the experimental platform, represented here by the simulator. Figure 2 illustrates the closed-loop setup commonly used in experimental optimization including in this work. Over the years, evolutionary algorithms (EAs) have proven to be efficient, flexible and robust optimizers for a number of closed-loop problems in areas such as shape design optimization, quantum control, drug discovery, analytical biochemistry, fermentation media optimization , marine biosurfactant production, batch distillation design, and biopharmaceutical portfolio management combined with capacity planning. 17 -27 For a detailed introduction into the field of closed-loop optimization including challenges, applications, and algorithms employed, please refer to Allmendinger. 14 An industrially-relevant case study is used to investigate how to tune some of the EA configuration parameters: population size, degree of elitism, number of Monte Carlo trials (needed to cope with uncertain parameters), and constraint-handling method. The fitness landscape of different scenarios of the case study is analyzed also to observe which landscape features pose a particular Figure 3. Schematic of the closed-loop platform employed comprising an optimization algorithm (EA) linked to a detailed process economics model; the interaction between the two components is supported by a database. The process economics model (its structure is illustrated using a UML diagram) performs mass balance and cost calculations and the EA determines the best chromatography equipment sizing strategies. A chromatography equipment sizing strategy is defined by the column bed height h i , column diameter d i , number of cycles n CYC,i and number of columns n COL,i for each chromatography step i = 1, . . . , k employed. Different sizing strategies may yield the same overall column volume but nevertheless vary in the objective COG/g. wileyonlinelibrary.com/jctb challenge when optimizing equipment sizing strategies. A comparison between the equipment sizing strategies used by manufacturers (regarded in this paper as the base case) and solutions provided by the EA is presented and a practical procedure based on visual tools for selecting the most preferred sizing strategy is illustrated.

METHODOLOGY Problem domain: chromatography equipment sizing
The chromatography equipment sizing problem can be represented as a combinatorial optimization problem with the task of finding the most cost-effective chromatography sizing setup for a sequence of chromatography steps used in the purification process of mAbs. A typical mAb manufacturing process, as shown in Fig. 1, consists of a number of packed-bed chromatography steps, with the chromatography sizing strategy selected having a direct impact on metrics such as manufacturing costs and time, and annual product output. To locate effective sizing strategies, the closed-loop setup illustrated in Fig. 3, which links an optimization algorithm with a process economics model, is employed. In the following the decision variables, objective function, constraints, and uncertain parameters, to which the chromatography equipment sizing problem considered here is subject to, are described. Figure 4 shows the encoding used to represent a solution x to the chromatography sizing problem. For each chromatography step 1 ≤ i ≤ k (k is the total number of steps) four discrete decision variables were defined related to the sizing and operation of chromatography columns: bed height h i and diameter d i of columns, number of cycles n CYC,i each column is used, and the number of columns n COL,i operating in parallel. That is, the problem is subject to k × 4 discrete variables in total. This captures the trade-offs of using large columns with a single cycle versus smaller columns with multiple cycles as illustrated in Fig. 3. Small changes in bed height were accommodated to account for typical ranges seen in industrial applications and it was assumed that such changes would not affect product quality or recovery. The use of multiple parallel columns per step was also incorporated so as to determine whether this offered significant advantages that might outweigh current preferences to avoid parallel columns due to validation burdens. For each step i, the variables define the (i) total volume of resin V i available for the purification of a product at that chromatography step, and the (ii) processing time T i that the chromatography step takes; both parameters are calculated according to standard mass balance equations as follows: 26

Decision variables
where CV BUFF,i and CV LOAD,i are the number of column volumes of buffer and product load applied per cycle, and u i is the linear flowrate of the resin used at step i.

Objective function
The objective f is to find a chromatography sizing setup that yields minimal cost of goods per gram (COG/g) of product manufactured. The COG includes both direct (resource) costs (e.g. resin, buffer and labor costs) and indirect costs (e.g. facility dependent overheads, such as maintenance costs and depreciation), and is divided by the total annual product output P to yield the metric COG/g. The COG/g values are obtained by running a detailed process economics model, which simulates the different purification steps based on mass balance and cost equations as defined by Farid et al. 28 A more detailed explanation of the COG/g components is presented in Simaria et al. 12 A key feature of this model is the impact of processing time on COG/g. The annual demand D is an input of the model and it is used to calculate the bioreactor size assuming a given number of batches is produced annually in the facility. However, if a particular equipment sizing strategy leads to long processing times, it may not be possible to meet the total number of batches and hence the annual product output would be below the production target. This penalizes the objective function through the decrease of the denominator in COG/g (i.e. the total amount of grams of product produced).

Constraints
The problem is subject to two types of constraints: 1. Each chromatography step i = 1, . . . , k needs to satisfy a resin requirement constraint to ensure that the resin volume V i available for purification at step i is sufficient to process the mass of product M i entering that step, given the resin's dynamic binding capacity DBC i and the maximum utilization factor κ. Formally, this constraint can be defined as Solutions violating this constraint are considered infeasible and handled using one of the constraint-handling strategies introduced later. 2. There is also a demand constraint to ensure that the amount of product manufactured P is sufficient to satisfy the annual demand D, or P ≥ D. This constraint may be violated for column sizing strategies with long chromatography processing times T i . Using COG/g as the objective function (recall that the product output P is in the denominator of this metric) was found to be sufficient to cope with this constraint.

Uncertainties
Uncertainty related to the product titer can have a significant impact on the annual product output P. As the equipment sizing is a function of an expected titer value for bioreactors through to chromatography columns, titer fluctuations can cause (i) failure to meet demand (if titer is lower than expected), or (ii) product waste (if titer is higher than expected and equipment capacity is insufficient to process the excess). Other sources of uncertainty (e.g. yield, and processing times) may be present and are realistic but are not considered in this paper.

Search algorithms
To gain insight into the behaviour of evolutionary search algorithms on the chromatography sizing problem, four types of search algorithms were considered: a simple generational genetic algorithm (SGA), a genetic algorithm with generation gap (GA-GG), a genetic algorithm with a (μ + λ)-ES reproduction scheme (GA-ES), and a population of stochastic hill-climbers (PHC). Algorithm 1 shows the pseudocode of the search algorithms and the way the constraint handling strategies (which are introduced below) are managed.
All four algorithms began the search with the same initial population containing μ randomly generated solutions. The algorithms used also the same mutation operator, which selected a decision variable value at random from the set of possible values. SGA used uniform crossover and random flip mutation as the variation operators, and binary tournament selection (with Algorithm 1. Pseudocode of the search algorithms with constraint handling strategies replacement) for parental selection; for environmental selection, it replaced the entire current population with the offspring population. GA-GG and GA-ES differ from SGA in the environmental selection step only. With GA-GG, the new population was formed by selecting the fittest μ solutions from the combined pool of the offspring population and the two fittest solutions of the current population. With GA-ES, a greater degree of elitism was employed and the fittest μ solutions from the combined pool of the current population and the offspring population were selected. PHC maintained a population of stochastic hill-climbers, which, at each generation g, independently underwent mutation and replaced their parent if it was at least as fit. wileyonlinelibrary.com/jctb R. Allmendinger et al.
each candidate solution. The fitness of a solution was then the average of the COG/g values across the m trials, and this average was updated if a solution happened to be evaluated multiple times during an optimization procedure.

Handling of infeasible solutions
Five constraint-handling strategies were analyzed to cope with infeasible solutions (violating Equation 3). Four of them (RS1, RS2, RS3 and RS4) repaired infeasible solutions, i.e. modified the genotype of a solution, while strategy RS5 avoided repairing in some way in order to prevent drift-like effects in the search direction (which may occur due to repairing). The four 'repairing' strategies iteratively increased the values of the decision variables (associated with a particular chromatography step i), one variable at a time, until Equation 3 was satisfied or until the maximum value of a variable was reached, in which case the value of another variable was increased. The sequence in which the variables were modified affected the search. To investigate this effect, different sequences, represented by the strategies RS1 to RS4, were analyzed. The strategy, RS1, applied repairing according to the decision variable sequence d i → n CYC,i → h i → n COL,i (where i is the chromatography step violating Equation 3); this sequence represents typical rules applied in equipment sizing scale-up models. The strategy, RS2, employed the inverse sequence of RS1, i.e. n COL,i → h i → n CYC,i → d i . The strategies, RS3 and RS4, switch between different repairing sequences during an optimization procedure. While RS3 chooses at random between the two sequences employed by RS1 and RS2, the strategy, RS4, chooses at random among all possible repairing sequences (note, there are 4! different sequences to choose from) whenever it needs to be repaired. The approach employed by RS4 would be plausible, for example, if no prior knowledge about promising repairing sequences was available. The strategy, RS5, does not apply repairing but penalizes infeasible solutions by degrading their fitness by a large fixed penalty value c.

Case study description
An industrially-relevant case study was considered to demonstrate the ability of the framework to generate cost-efficient and robust chromatography equipment sizing strategies. The case study focuses on a single-product mAb manufacturing facility that employs a process sequence as shown in Fig. 1 (with k = 3 chromatography steps) to satisfy a total product demand of D = 500 kg/year with an expected titer of 3 g/L. The main details of the chromatography steps used in the model to perform mass balance and cost calculations are presented in Table 1. Titer variabilities were modelled using the triangular probability distribution, Tr  Note: CV = number of column volumes. The experimental study investigated different settings of the parameters involved in the search algorithms. However, if not otherwise stated, the default settings given in Table 2 were used. Any results shown are average results across 20 independent algorithm runs. To allow for a fair comparison of the search strategies, a different seed was used for the random number generator for each EA run but the same seeds for all strategies. This allows for the application of a repeated-measures statistical test, the Friedman test, 29 to investigate performance differences between algorithmic setups.

RESULTS AND DISCUSSION
Before analyzing the behaviour of the search algorithms on the chromatography equipment sizing problem, an indication of the properties of the fitness landscapes spanned by three case study scenarios is given. For this, the adaptive walks method was adopted. 16,30 Starting from a randomly generated solution, an adaptive walk calculates the fitness of all neighbours of the solution that can be generated with a single mutation step, and selects one of the fitter neighbours at random to move to. If there is no fitter neighbour, then the walk has reached a local optimum and terminates. In this study, 1000 adaptive walks (using a fixed titer of 3 g/L) were performed on the landscape of each scenario, and the length of each walk was recorded. Figure 5  ratio tends to decrease the average length and variability of an adaptive walk. That is, the landscape becomes more rugged, or, equivalently, the number of local optima increases. This pattern is due to tighter DSP windows, which cause more solutions to violate the demand constraint and thus makes the problem harder to solve. This also causes an increase in the COG/g values as indicated in Fig. 5(b). The next section presents an analysis of how the search algorithms fared for both the deterministic (using a fixed titer of 3 g/L) and stochastic scenario. as SGA, did not perform well due to the high probability of getting trapped in one of the many local optima of the fitness landscape. Large population sizes converged slowly due to the low number of generations available for optimization. PHC was inferior to random search because the hill-climbers could get trapped in local optima, in which case further improvements were unlikely, while random search kept on generating (at random) new and potentially fitter solutions. (Note, the performance of random search is constant for different values of μ as it depends only on the total number of function evaluations available.) Figure 6(b) investigates the performance impact of the five constraint-handling strategies, RS1 to RS5, when augmented on GA-ES (a similar performance impact was present for the other search algorithms). From the plot it is apparent that the constrainthandling strategy employed had an effect on the convergence speed and the final solution quality. Furthermore, it indicated that a repairing strategy (RS1, RS2, RS3 and RS4) performed better than a non-repairing strategy (RS5). The superior performance of RS1 is due to the fact that the variable d i is modified (increased) first when repairing a solution. Unlike to the other variables, an increase in di is often sufficient to just satisfy the resin requirement constraint without increasing the processing time. obtained with RS2, RS3 and RS4 it can be concluded that if di cannot be changed, then either the variable n CYC,i or h i should be modified to meet the resin requirement constraint.

Stochastic product titer
The performance of the algorithms was then investigated in the presence of uncertain product titers for different values of Monte Carlo trials m. Figure 7 indicates that uncertainty impacts negatively the convergence speed and under certain circumstances also the final solution quality. This impact tends to be less severe as the degree of elitism employed by an algorithm increases (i.e. the performance of GA-ES is less affected than that of SGA). Elitism can help circumventing this issue as it causes a population to converge (quickly) to a (local) optimal region and then exploit this region. However, on the other hand, too much elitism ( Fig. 7(a)) may disturb and prevent the generation of innovative solutions; here, optimization in a stochastic environment using relatively small values of m can yield better performance than optimization in a deterministic environment due to the greater randomness in the search. When the optimizer does not employ elitism ( Fig. 7(b)), however, any additional randomness in the search may be a burden (because it can cause a population to oscillate between different regions of the search space, preventing or slowing down convergence towards promising regions).
In the following, the discussion will be limited to results obtained with GA-ES, as it was the best performing optimizer. The Monte Carlo trial setting m = 25 will be used to account for the trade-off between converging reliably to high quality solutions and being able to escape from suboptimal search regions. Figure 8 shows the sizing strategies for the most expensive chromatography step (i = 1) found by GA-ES for the USP:DSP ratios 1:1 ( Fig. 8(a)), 2:1 ( Fig. 8(b)) and 4:1 (Fig. 8(c)) at the end of the search across 20 independent algorithmic runs. The filled bubble in each plot indicates the best solution found by the EA, and the filled diamond the base case setup, which will be discussed in more detail in the next section. From the plots it is apparent that, when moving from 1USP:1DSP to 4USP:1DSP, the number of cycles tends to decrease as the time available for purification shortens. The EA finds more similar solutions for the scenario 4USP:1DSP than for 1USP:1DSP because the problem is harder to solve, as already indicated in the landscape analysis conducted previously.
Comparison with the base case Figure 9 compares key performance metrics between the best equipment strategy found by GA-EA and the base case setup for chromatography step i = 1 (as this was the most expensive step) for the different ratios of USP:DSP trains in a stochastic optimization environment. The bar chart demonstrates that the EA discovered chromatography column sizing strategies that improved the value of the objective function COG/g relative to the empirical approach to column sizing often adopted by industry (i.e. base case, as described earlier) with savings in COG/g of up to 20% being achieved. The characteristics of the optimal solutions found by the EA depend on the type of scenario being addressed. For 1USP:1DSP, where the DSP window is unconstrained at 14 days, the optimized column sizing strategy employs a smaller column running for more cycles than in the base case. This reduces the amount of resin purchased and thus also the COG/g. In the scenario 4USP:1DSP, where the DSP window is constrained to 4 days, the column sizing strategies are optimized to run faster (i.e. fewer cycles and larger diameters), increasing the output kg produced and thus minimizing COG/g. The use of stochastic EAs permitted the identification of more robust solutions, better equipped to handle titer fluctuations, as shown by the lower standard deviation values of COG/g and reduced product waste (Fig. 9). In the base case approach, the columns are sized according to the expected average titer without specifically accounting for titer fluctuations. Hence, in situations where titers are higher than expected, the columns may not have sufficient excess capacity to cope with higher product loads, leading to product waste. The sizing strategies found by the EA exhibit higher resin volumes and thus overcome this issue to a certain extent.

Incorporation of user preferences for selection of strategies
For all scenarios, the solutions in the final population found by GA-ES have COG/g values that do not differ by more than 3% of each other, and thus can be considered as valid alternatives available to the decision maker. This allows for the consideration of further criteria and preferences when selecting the strategy to be implemented in the real world, as illustrated in Fig. 10. For example, if space limitations and ease of operation considerations restrict the diameter of columns that can be used to d i ≤ 1.6 m, then solutions that do not meet the requirements can be excluded (black bubbles in Fig. 10)   might be the need to narrow the range of variation of the column bed height. The grey bubbles in Fig. 10 represent strategies that would not be feasible if the bed height was required to be in the range 18 ≤ h i ≤ 22 cm. Therefore the white bubbles are the strategies that meet all criteria. The plot illustrates that even with these user preferences, the user still has a set of alternative solutions with similar COG/g values to choose from, thus providing greater flexibility to meet pressures for continuous process improvement.

CONCLUSIONS
This paper has considered a real-world problem concerned with the identification of cost-effective equipment sizing strategies for purification processes (with focus on chromatography steps) which wileyonlinelibrary.com/jctb are core to all bioprocesses. The industrial case study was applied to monoclonal antibodies which represent the fastest growing segment of the pharmaceutical industry where a significant focus is on the need for more cost-effective and robust purification processes for different facility configurations. The application was formulated as a combinatorial closed-loop optimization problem subject to (i) expensive fitness evaluations (relying on time-consuming computer simulations), (ii) multiple dependent decision variables (related to properties of a chromatography column), (iii) constraints (related to resin requirements and product demand targets), and (iv) uncertain parameters (related to variations in product titers, which were accounted for using Monte Carlo simulations). The investigation highlighted that EAs can identify a diverse set of equipment sizing strategies that are more cost-effective than the strategies used in industry. In particular, the analysis demonstrated that an EA performs best when elitism is employed in combination with a small number of Monte Carlo trials, infeasible solutions are handled using a non-trivial repairing strategy, and (when resources are limited) a medium-sized population (a size between 30 ≤ μ ≤ 80) is evolved for a relatively large number of generations. Furthermore, the study illustrated that having available a diverse set of cost-effective sizing strategies (rather than a single one) is beneficial when it comes to account for user preferences and selecting a strategy to be employed in the real world.
Future research will look at extending the equipment sizing problem considered here with relevant decision variables related, for example, to the sequence of a purification process employed and each step's operating conditions. The ultimate goal is to formulate an optimization problem that covers as many relevant aspects arising along the lifecycle of a biopharmaceutical product as possible. The benefit of solving this problem is that decisions regarding the development of a product can be made at an early stage, resulting in potentially significant financial savings.