Fully Atomistic Modeling of Realistic Plasmonic Materials: Assessing the Performance of Iterative Solvers

The fully atomistic modeling of real-size plasmonic nanostructures is computationally demanding, therefore most calculations are limited to small-to-medium sized systems. However, plasmonic properties strongly depend on the actual shape and size of the samples. In this paper we substantially extend the applicability of classical, fully atomistic approaches by exploiting state-of-the-art numerical iterative Krylov-based techniques. In particular, we focus on the recently developed $\omega$FQ model, when specified to carbon nanotubes, graphene-based nanostructures and metal nanoparticles. The performance of Generalized Minimal Residual (GMRES) and Quasi-Minimum Residual (QMR) algorithms is studied, with special emphasis on the dependence of the convergence rate on the dimension of the structures (up to 1 million atoms) and the physical parameters entering the definition of the atomistic approach.


I. INTRODUCTION
Nanoplasmonics is an emerging field that has been significantly developed in the last decade. 1,2 The optical properties of electron-free nanomaterials (e.g. metallic nanoparticles or graphene) are characterized by the rise of surface plasmons, which are coherent oscillations of the conduction electrons induced by an incoming radiation. 3 One of the peculiarities of such systems is that their plasmon resonance frequency (PRF) can be tuned as a function of the shape, size and supramolecular structure. [4][5][6][7] In the particular case of graphene, PRF can also be tuned by exploiting electrical gating and chemical doping, which result in a modification of its Fermi energy, thus giving rise to many diverse applications, also in the technological field. 8 A peculiarity of plasmonic substrate is that their optical response strongly depends on the system's size. In fact, small nanostructures do not experimentally exhibit the properties of larger structures. 9 Therefore, only the latter are actually exploited in real applications. These features strongly limit the role of modeling through full quantum mechanical (QM) descriptions, that are impracticable for real-size systems. For this reason, plasmonic structures are generally described by resorting to classical approaches [10][11][12][13][14][15][16][17][18][19] , and in particular through implicit models that describe the nanostructure as a continuum, and its optical properties in terms of the frequency-dependent complex permittivity. [20][21][22][23][24][25][26] Clearly, continuum methods lack any atomistic description of the substrates, which can instead be retained by using explicit, atomistic approaches, where the optical response directly arises from atomic parameters, as for instance computed frequencydependent atomic complex polarizabilities. [10][11][12][13][14][15][16]27,28 On the other hand, continuum approaches are not computationally demanding, because the Maxwell equations are solved on the surface of the nanostructure, while atomistic models require the treatment of all atoms, therefore the cost of the calculation scales as a function of the volume. This is the price to pay to describe for instance doping 29,30 and structural defects 31 , a) Electronic mail: chiara.cappelli@sns.it which cannot be modelled by continuum approaches.
Due to the unfavourable scaling of classical atomistic models with the number of atoms, the calculation of the optical properties of real size structures, which are constituted by millions of atoms, is a hard task. Regardless of the specific discrete model, in fact, the calculation of the plasmonic response involves the solution of a number of complex-valued, frequency-dependent linear systems, with a dense coefficient matrix of order equal to the number of atoms. The solution of the linear system is challenging in terms of both memory requirements and computational cost, therefore the investigation of the performance and applicability of alternative numerical methods to approach this problem is particularly relevant and timely.
In this work we apply different numerical techniques to solve the problem. In particular, we model the optical response of large graphene-based structures by exploiting a fully atomistic, yet classical approach that has been recently developed by some of us, named ωFQ. 27,28,32 There, each atom of the substrate is endowed with a charge, whose value is determined by solving a complex-valued linear system. The charge exchange between the atoms is governed by the Drude mechanism and by quantum tunneling. ωFQ has recently been shown to be successfully applicable to both graphene-based materials and plasmonic metal nanoparticles, for which accurate results are obtained. However, similarly to all atomistic approaches, the solution of the ωFQ linear system by means of direct methods based on the LU factorization of the system matrix scales unfavourably with the size of the nanostructure, therefore it is an ideal playground to test the performance of various solution algorithms based on Krylov subspace iterative methods, 33 such as the Generalized Minimal Residual (GMRES) and Quasi-Minimum Residual (QMR) algorithms. 34,35 In particular, GMRES and QMR computational timing and convergence rates are compared with the direct inversion solution, which is also much more demanding in terms of memory requirements.
The manuscript is organized as follows. In the next section, we briefly recap the fundamentals of ωFQ, by especially focusing on the mathematical properties of the ωFQ linear system. Similarities and differences with continuum approaches are discussed, and Krylov-based methods are presented. We then apply the newly developed algorithms to the modeling of the plasmonic properties of selected nanostructures, of size up to hundreds of nanometer and constituted by roughly one million atoms. A summary and an overview of future developments end the manuscript.

II. THEORETICAL MODEL
ωFQ is a fully atomistic, classical model that describes the response of a system, such as nanoparticles or graphenebased nanostructures, to the external electric field E. 27 In particular, each atom of the system is endowed with a charge. Charge exchange between different atoms is governed by the Drude mechanism of conduction, 36 and modulated by quantum tunneling. 27 The key equation for solving the charges q reads: is the electric potential acting on the i-th charge associated with the external electric field oscillating at frequency ω, D i j is the charge interaction kernel and K tot i j is a matrix accounting for both Drude and tunneling mechanisms.
More in detail, the linear system in eq. (1) describes the response of a set of N complex-valued charges q j under the effect of an external monochromatic uniform electric field of frequency ω polarized along thek direction, withk =x,ŷ,ẑ. The associated potential V ext on each atom, entering the righthand side of eq. (1), is defined as: where E k 0 is the intensity of the electric field along the k direction, r i is the position of the i-th charge and k i is the component of r i along thek axis, i.e. k i = r i ·k. The matrix D on the left-hand side of eq. (1) describes the electrostatic interaction between the charges, and it is defined in the standard formulation of the FQ force field exploited for treating molecular systems. 37,38 In order to avoid the so-called "polarization catastrophe", 10 instead of point charge we use spherical Gaussian charge distributions of width d i and d j to describe ωFQ charges. D elements then read: 27,[39][40][41] where r i j = r i − r j is the distance between charges i-th and j-th, erf is the error function and η i is the atomic chemical hardness of the i-th atom. 39,42 Gaussian widths d i and d j are chosen for each atom by imposing that the limit for r i → r j corresponds to the diagonal element of the matrix, i.e.: lim The D matrix can be formally seen as an overlap matrix defined in the scalar product weighted by the 1 r function. Therefore, it is symmetric positive definite (SPD). 39 The K tot matrix in eq. (1) reads: where n 0 is the electron density, τ is a friction-like constant due to scattering events and A i j is an effective area dividing atoms i and j. f is a Fermi-like damping function, defined as: in which r 0 i j is the equilibrium distance between atoms i and j, while d and s are parameters ruling the shape of the damping function. K tot is a frequency-dependent symmetric complexvalued matrix, and it can be interpreted as a "dynamic" response matrix, whereas the D matrix describes the "static" response. It is worth noticing the expression of K tot can be associated with two alternative response regimes. When f (r i j ) goes to zero, the purely Drude conductive regime is recovered; as r i j increases, the electron transfer descreases exponentially, thus leading to the typical tunnelling mechanism. 27 The diagonal elements of K tot do not enter eq. (1), but the notation can be simplified by imposing K tot ii = 0 for all i = 1, . . . , N and extending the summations over k and j in eq. (1) to all N atoms of the system.
As a final remark, the electron density n 0 , that appears in eq. (5), is a specific property of the chemical composition of the plasmonic substrate and of the shape of the system. In a general 3D system, n 0 can be expressed as n 0 = σ 0 /τ m , where σ 0 is the static conductance of the material, while m is its effective electron mass, which can be approximated to 1 for metal nanoparticles. However, in case of graphene-based materials, such as graphene sheets or carbon nanotubes, the effective electron mass needs to be taken into account. 28 In graphene-based materials, m can be expressed as where n 2D is the 2D electron density of the system and v F is the Fermi velocity. 8 The latter is related to the Fermi energy ε F through the expression v F = 2ε F m 0 , where m 0 is the electron rest mass. 8 The 2D electron density n 2D can be calculated from n 0 as n 2D = n 0 · a 0 , with a 0 being the Bohr radius. 28 Then, the 2D electron density can be calculated as the ratio of the number of atoms N and the surface of the system S, i.e.: where α is a parameter (< 1) which selects the fraction of π electrons which are involved in the studied plasmonic excitation. 28 We note that such a parameter is uniquely determined by the value of ε F . 28 A. Properties of the ωFQ linear system In this section, we analyze the mathematical properties of the ωFQ linear system defined in eq. (1). We first notice that in eq. (5) the complex frequency-dependent ratio describing the Drude model can be gathered: where K tot is a symmetric real-valued matrix. The frequencydependent complex factor w(ω) is always non-zero, so we can take it out from eq. (1): (10) At this point we can introduce the following notation: and eq. (10) can be expressed in vector notation as where I is the N-dimensional identity matrix. It can be noted that eq. (14) is fully equivalent to eq. (1), but A this time is a real-valued frequency-independent nonsymmetric matrix, of which the diagonal elements are shifted by a complex quantity. Moreover, the A matrix defined in eq. (11) can be rewritten as: By introducing a diagonal matrix P il = ∑ N k=1 K tot ik δ il where δ il is the Kronecker delta, we can write: and plugging the definition into eq. (15) we obtain: Therefore, the A matrix can be formulated as the product of two real-valued symmetric matrices, since K tot and D are symmetric and P is diagonal. However, A is a nonsymmetric matrix because D and K tot − P in general do not commute. Nevertheless, the following equality holds: where T indicates the transposition operator. Recalling that D is an SPD matrix, 39 we can define the D-inner product as: where ·, · is the standard Euclidean inner product. From eq. (18) and eq. (19) it can be demostrated that A is selfadjoint with respect to the D-inner product, i.e.: Ax, y D = DAx, y = A T Dx, y = Dx, Ay = x, Ay D .
(20) Equation (20) allows us to conclude that even if A is a nonsymmetric matrix, it is self-adjoint with respect to the inner product induced by the SPD matrix D, therefore A is diagonalizable with real eigenvalues.
As a final remark, the alternative expression of the A matrix in eq. (17) allows us to derive another property of the matrix itself. In fact, K tot − P is such that each row (or column) sums up to zero, therefore it is a singular matrix. By this, the matrix A is also singular. Nevertheless, the existence and uniqueness of the linear system solution in eq. (14) is guaranteed through the diagonal shift of the coefficient matrix with the complex scalar z(ω) defined in eq. (12), which is non-zero when ω = 0. However, numerical instabilities in the solution of the linear system can arise when ω approaches zero because A is close to singular.

B. Comparison with continuum approaches
The atomistic nature of ωFQ emerges from all the variables that enter eq. (1): charge positions, chemical hardnesses in eq. (3), effective areas in eq. (5) and equilibrium distances in eq. (6), as well as electronic features of the material that enter the Drude model. Such an approach allows us to describe the (macroscopic) plasmonic response of the system in terms of (microscopic) atomistic quantities, regardless of the shape of the system. Therefore, complex effects associated with surface roughness and edge effects are automatically considered.
As stated in the Introduction, the plasmonic response of complex systems can be described by resorting to continuum approaches, such as the Boundary Element Method (BEM). 24,43 There, the material is treated as a continuum and its electronic properties are synthesized by its frequencydependent dielectric permittivity function ε(ω). The plasmonic response arises as a surface charge density σ (r), which is computed by solving Maxwell equations, via a reformulation as a boundary integral equation on the material surface. From the computational point of view, the latter is discretized in N surface elements centered at positions r i , with i = 1, . . . , N. At the same time, the surface charge density σ (r) is discretized in terms of N electric charges. The equation for solving the charges in BEM reads: 24 : where ε out , ε in are the frequency-dependent complex-valued dielectric permittivity functions of the outer (usually vacuo) and inner (the material) region, respectively, σ i = σ (r i ) is the electric charge evaluated on the surface element at position r i , while φ i = ∂ φ (r i ) ∂ n is the surface derivative of the external potential φ at position r i . Moreover, I is the N-dimensional identity matrix and F i j is the normal derivative of the Green function, i.e.: wheren i is the normal vector to the surface at point r i . ωFQ and BEM are genuinely different in terms of performance and versatility: surface roughness can easily be taken into account through an atomistic approach, while the continuum model needs specific treatments such as perturbative expansions. 44 Nevertheless, from the purely algebraic point of view, there are some similarities. First of all, eqs. (14) and (21) have the same structure. In both cases, charges are obtained by solving a dense system of linear equations, in which the left-hand side is written in terms of a nonsymmetric frequency-independent real-valued matrix [A for ωFQ (see eq. (11) and F for BEM (see eq. (22))] with a uniform complex-valued diagonal shift, which describes the electronic properties of the material at a specific frequency (z(ω) defined in eq. (12) for ωFQ and the permittivity in eq. (21) for BEM). Moreover, it has been shown that the F matrix in eq. (22) is singular and diagonalizable with real eigenvalues 24,[45][46][47] , similarly to the A matrix in eq. (11) (see section II A for the proof). Therefore, the same computational techniques to solve the linear system, which in this paper are described for the ωFQ approach, can also be exploited for BEM.

III. SOLUTION STRATEGIES
In order to model the optical spectra of plasmonic substrates, eq. (1), or equivalently eq. (14), need to be solved for a certain number of frequencies ω. This can effectively achieved only by resorting to efficient methods to solve the dense nonsymmetric complex linear system. This point is crucial, especially when the dimensionality of the system increases. Direct techniques of solution (e.g. based on factorization of the coefficient matrix) are to be avoided, because they are inefficient both in terms of storage demand and computational cost, which scales as N 3 . Thus, matrix-free iterative techniques, which in our implementation scale as N 2 , are a promising alternative. In particular, such approaches can be implemented without necessarily storing in memory the whole matrix A − z(ω)I, but in terms of matrix-vector products which can efficiently be computed in a parallel environment.
One of the most powerful techniques to solve a linear system Ax = b (with a generic matrix A) of order N is to resort to Krylov subspace iterative methods. 48,49 The idea behind this family of methods is to build an approximate solution to the linear system at step m in the m-dimensional affine subspace where r 0 is the residual associated with the initial guess x 0 .
In this work, two different Krylov-based iterative methods have been tested for the solution of eq. (14), namely the Generalized Minimum RESidual algorithm (GMRES) 34 and the Quasi Minimal Residual (QMR) method. 35 GMRES. It is a general approach to nonsymmetric linear systems. 34 At each step m, an approximate solution of the linear system is obtained as: where V m is an orthonormal basis of the m-dimensional Krylov space K m (A, r 0 ). The vector y m is determined such that the 2-norm of the residual r m = b − Ax m is minimal over K m . The orthonormal basis of K m is obtained via the Arnoldi process 50 , and the orthogonal projection of A onto K m leads to an upper Hessenberg matrix H m = V † m AV m . 51 Therefore the least squares problem can be efficiently solved through a QR factorization of H m . 51 The QR decomposition of H m can be updated cheaply on each iteration, but at each step a new vector must be stored, so the memory cost is not constant during the iterative procedure. 51 In order to reduce the memory required by GMRES, the so-called "restarted" GMRES algorithm has been developed, also known as GMRES(k) 34,52 . There, the iterative procedure is stopped after k steps, and the GMRES algorithm is restarted by using the last iterative vector x k as the new initial guess vector from which the Krylov subspace is built once again. By this, no more than k vectors are stored in memory at the same time, however the algorithm is expected to converge more slowly than standard GMRES . 52 QMR. Similarly to GMRES, QMR has been developed for solving nonsymmetric linear systems. In this case, the coupled two-term Lanczos algorithm is adopted to generate two Krylov spaces K m (A, p 0 ) and K m (A T , q 0 ) where p 0 , q 0 are two initial vectors. 53 At each step, an approximate solution to the complex-valued linear system is defined as: which is similar to what is done by GMRES(see eq. (24)). In fact, P m is the basis set of the Krylov space K m (A, p 0 ), while the y m vector is associated with an approximate 2-norm of the residual. 53 In the QMR formalism, the residual r m = b − Ax m can be written as where the matrices M m , N m and the vector f m are obtained through the basis sets previously defined. 53 In QMR, the vector y m is chosen so to minimize the quantity in brackets in eq. (26). In other words, y m is the solution of the least squares problem min y m f m − N m y m . Such a procedure yields a "quasi-minimization" of the residual, therefore it is expected to converge more slowly than GMRES. On the other hand, at each step a fixed number of vectors need to be calculated, thus memory requirements are constant along the whole iterative procedure. Note that we have considered the QMR algorithm because a simplified version has been proposed by Freund and Nachtigal in case of J-symmetric coefficient matrices, i.e. such that A T J = JA for a SPD matrix J. 54 This property, which we have demonstrated for the ωFQ matrix in section II A (with J = D), allows us to simplify the Lanczos process by choosing the starting vectors such that q 0 = Jp 0 . In this way, the computational effort to compute the basis sets of the involved Krylov space is reduced, because the calculation of the matrix-vector product A T x is not needed. As a final remark, in this work the Krylov iterative methods have been applied without resorting to preconditioning techniques. Some basic preconditioners (e.g. the diagonal of A − zI) have been tested, without yielding any improvement in the convergence rate.

IV. NUMERICAL RESULTS
The GMRES and QMR algorithms for the solution of the complex ωFQ linear system in eq. (14) have been implemented in a standalone FORTRAN95 code, named nanoFQ, in a parallel environment through the OPENMP application programming interface (API). 55 To apply complex GMRES, nanoFQ has been interfaced with a public domain software developed by Frayssé and co-workers. 56 The QMR-from-BiConjugate Gradients (BCG) algorithm without look-ahead for J-symmetric matrices 54,57 has been implemented from scratch. All calculations have been performed on a Xeon Gold 5120 (56 cores, 2.2 GHz) cluster node equipped with 128 GB RAM, if not stated otherwise.
The performance of GMRES and QMR algorithms has been computationally compared by calculating the number of iterations (NI) required to converge the solution of the linear system to a pre-defined threshold. The 2-norm of the residual vector has been used as a convergence criterion, where q k is the vector generated at the k-th iterative step and T is a user-defined threshold.
The ωFQ approach has been applied to the prediction of the optical properties of selected chiral carbon nanotubes (CNT) and graphene disks (GD) (see fig. 1 for their molecular structures). For both systems, different geometries have been generated by modifying the length L and/or the diameter d C for CNTs, and the diameter d D for GDs. The total number of atoms in the studied structures varies from 8208 to 49248. It is worth remarking that due to the cutting procedure adopted to construct the GDs, dangling bonds possibly occurring on the edges of the disks may be retained (see fig. 1b). However, they do not affect the optical properties of large systems (see Fig. S1 given as Supplementary Material -SM). Thus, they can be retained without affecting computed properties and the convergence rate of the two algorithms.
In the following, we study the dependence of NI on: • electronic parameters, such as the relaxation time τ and the Fermi energy ε F that enter eqs. (5) and (7), respectively; • external field frequency ω, which enters eq. (14) through the z(ω) coefficient defined in eq. (12); • geometry of the systems, in particular the GD diameter and the CNT length/diameter (see fig. 1); • iterative algorithm, i.e. QMR, GMRES or restarted GMRES(k).
Since we are dealing with iterative procedures, also the choice of the initial vector q 0 (see eq. (27)) can strongly affect the NI. Therefore, in order to compare the different algorithms reliably and in a reproducible way, we choose q 0 = 0 in all cases.

A. Electronic parameters
We first analyze the dependence of NI on the electronic parameters that enter the definition of ωFQ model, i.e. the Fermi energy ε F (see eq. (7)) and the relaxation time τ (see eq. (5)). The plasmonic response of GD varies as a function of both τ and ε F . 9,28,58,59 To analyze such a response, we consider the longitudinal absorption cross section σ k , that can be calculated as: where c is the speed of light, k i is the position of the i-th charge along thek axis and E k 0 is the k-th component of the intensity of the external electric field. Im(q k i (ω)) is the imaginary part of the i-th charge induced by an external electric field polarized along thek axis with frequency ω. The isotropic absorption cross section σ can be calculated by averaging the longitudinal one along the three axes, i.e.: where we have introduced a compact vector notation in terms of the scalar product between the imaginary part of the charges and their positions. It has been shown by some of the present authors 28 that PRFs, i.e. frequencies corresponding to σ maxima, are independent of τ. In fact, the latter only affects the excitation peak broadening and amplitude, that scale with τ and 1 τ , respectively.
The dependence of NI on τ and ε F has been studied for a GD with d D = 20 nm (GD20), which is constituted by 11970 carbon atoms. The full (i.e. non-restarted) GMRES NI has been calculated on 200 frequencies in the range between 0.0 eV and 2.0 eV with a constant step of 0.01 eV. The convergence threshold T has been fixed to 10 −6 a.u. (see eq. (27)). ωFQ parameters have been set to those exploited in Ref. 28,59 The ωFQ linear system has been solved with the GMRES algorithm by using ε F = 1.51 eV and τ = 17000 a.u. The computed σ X (ω) and NI are reported in fig. 2, where the absorption cross section has been scaled to make all peaks visible (see Fig. S2 in the SM). By varying the external field frequency, σ X shows a set of local maxima of different amplitudes. Since all calculations have been performed with a constant step of 0.01 eV, each PRF has an intrinsic error of 0.01 eV. Similar errors can also affect the relative intensity of the local maxima: the absorption peak is extremely sharp when τ is large, thus a small variation in frequency induces a large variation in intensity.
Similarly to σ X , the NI plot is characterized by a distribution of local maxima, with the same intrinsic error of PRFs. From an inspection of fig. 2, a strong correlation between the two sets of local maximum points is observed, and this is es-pecially true for the most significant maxima highlighted in fig. 2. This result is not surprising: from eq. (28) it is expected that a local maximum of σ X (ω) is necessarily associated with a local maximum (in absolute value) of ωFQ point charges. The charge densities associated with σ X (ω) highlighted local maxima are plotted in fig. 3, and they clearly represent plasmon modes of increasing order. In fact, the number of nodes (N nodes ) is always odd for symmetry reason, and increases as frequency increases. 36 Since the iterative procedure starts from the q 0 = 0 vector, when the distance between the guess and the solution vectors increases, NI increases. Moving to the global trend of the NI (see fig. 2), the required number of iterations is small for the first PRF at ω = 0.3 eV, then the NI increases in the middle region of the spectrum and finally decreases. Two underlaying mechanisms may explain this peculiar trend. First, the lowest-order plasmon modes (e.g. the dipolar one at 0.3 eV) are strongly localized on the edge of the system (see fig. 3). Therefore a small number of large-valued point charges is involved in the excitation, but most charges are instead close to zero (e.g. those placed in the middle of the structure). By this, the guess vector q 0 = 0 a.u. is a satisfactory starting point for the iterative procedure, and a small number of iterations is sufficient to obtain the solution vector. This is not true for the highest-order plasmon modes, that are instead delocalized all over the system (see fig. 3). In addition, the plasmonic response intensity (i.e. the point charges absolute value) strongly decreases when the excitation order increases (see fig. 2, top), because the number of nodes in the plasmon mode is larger. This is also evinced by the isovalue used to plot the densities in fig. 3, which decreases as the PRF increases. Therefore, the NI in correspondence to the highest order plasmon modes tends to decrease. The presence of low-amplitude local maxima, represented by dashed lines in fig. 2, top panel, can be attributed to numerical artifacts (see Fig. S3 in the SM).
The dependence of NI and σ X on τ and ε F is reported in figs. 4 and 6, respectively. Focusing on the dependence on τ ( fig. 4), we see that the PRF is not affected by this parameter, as it has been already reported in previous works. 28 The main effect of the variation of τ is the shrinking of the excitation band shape, and the associated increase of intensity. In the energy region between 0.5 eV and 1.5 eV NI increases with τ, and new local maxima in both σ X and NI at τ = 680 a.u. arise. These local maxima are associated to the high-order plasmon resonance modes identified in fig. 2 and represented in fig. 3. The most relevant plasmon resonance mode is the dipolar excitation, because it is generally associated with the highest amplitude and the lowest PRF, which make it the most suitable for physical applications. 60 A smaller value of τ can be adopted to achieve a reliable description of this excitation. In fig. 5, GD20 σ X calculated by setting τ = 17000 a.u. and τ = 170 a.u., and ε F = 1.51 eV are reported. Claerly, the first plasmon excitation with PRF=0.3 eV is the most intense for both τ values. For τ = 170 a.u., σ X is characterized by a single maximum since the reduction of the relaxation time induces also a proportional reduction of the plasmon excitation intensities. 28 By this, we can conclude that τ = 170 a.u. can be chosen in order to reduce the computational cost of the iterative procedure, since we are mostly interested in the description of the dipolar excitation. The dependence of NI on ε F is reported in fig. 6. We see that the increase of ε F results in a blue shift of the PRF, and in the increase of the absorption intensity. In fact, a higher value of ε F is associated with an increase of n 2D (see eq. (7)), because a higher fraction of π electrons are involved in the excitation. The NI shows a similar trend, because the global maximum is blue shifted and the required number of iterations increases.

B. Geometry
In this section, we investigate the dependence of NI on the geometry of the system. The same CNT and GD structures investigated in the previous section have been selected, for which we have varied the characteristic dimensions (see Fig.  1).

FIG. 6: GD20 σ X (top) and NI (bottom) as a function of the
Fermi energy ε F (given in eV).
CNT.   fig. 1a for their definition). The number of atoms for each structure is also given.
in eq. (14) has been solved for 200 frequencies in the range between 0.0 eV and 2.0 eV with a constant step of 0.01 eV, by setting ε F = 1.04 eV and τ = 170 a.u., respectively. First, we comment the results obtained by fixing d C = 1.36 nm and by varying L from 50 to 300 nm (see Table I, top block). The data are shown in fig. 7 for both GMRES and QMR algorithms, in case the external field is aligned along the transversal (X) or longitudinal (Z) directions. We are then assuming that the two possible transversal directions (X and Y) provide the same polarization, even if all the considered CNTs are chiral. In fact, the differences in the plasmonic response along the two di-rections are negligible (see Fig. S4 in the SM). panel, shows that the NI along the transversal direction is the same for all systems, because the diameter is kept constant. In case of longitudinal polarization, the number of iterative steps increases as the length of the system increases in the low energy region of the spectrum, for both GMRES and QMR. This is due to the fact that PRFs are red-shifted as L increases approaching 0 eV (see Tab. S1 in the SM). The z(ω) factor in eq. (12) is therefore close to 0, and since the A matrix in eq. (14) is singular, the number of iterations increases due to increased ill-conditioning. We now move to comment the results obtained by varying the CNT d C , by keeping constant L = 50 nm (see Table I, bottom block). Computed GMRES and QMR NI for such systems are reported in fig. 8. Differently from the previous case, the NI trend does not show a strong dependence on d C , for both transversal and longitudinal directions of the applied electric field. This is related to the fact that, although PRF energies are red-shifted as d C increases (along the X direction), the smallest PRF associated with the dipolar plasmon is far from 0 eV (0.37 eV for CNT4, see Tab. S1 in the SM). Therefore, in this case severe ill-conditioning is avoided and the NI remains almost constant.
As a final comment, we note that GMRES and QMR provide almost the same convergence rate. In particular, QMR NI is usually slightly higher than GMRES, thus confirming what has been observed in section III (see also Fig. S5 in the SM).
GD. Let us now focus on the NI calculated for four different GDs, obtained by varying the d D diameter (see fig. 1b). Geometrical parameters are reported in Table II fig. 1b for their definition). The number of atoms for each structure is also given.
For each structure, NI has been calculated for GMRES and QMR, and the results are reported in fig. 9. Interestingly, the NI presents a weak dependence on the d D diameter. In fact, the largest difference in the number of iterations is of about 10 between GD36 and GD20. However, GD36 has almost four times the atoms of GD20, thus demonstrating the favourable scalability of the two algorithms. We finally note that also in this case the PRFs are red-shifted as the size of the system increases (see Tab. S1 in the SM).

C. Algorithm parameters
We now move to consider the technical parameters associated with the iterative procedure, and how they affect the convergence rate. We first study the performance of GMRES(k) (see section III), by taking as a reference system the CNT300 structure (see table II) and exploiting the the same parameters used above for CNTs for solving the ωFQ linear system. The iterative procedure has been performed by varying k (between 20 and 80), and by keeping the threshold fixed to T = 10 −6 . Computed NI are reported in fig. 10, together with the corresponding results obtained with the full GMRES (F.G) algorithm (i.e. non-restarted). For X and Y polarizations, the reduction of the Krylov subspace does not affect the NI behaviour. On the contrary, for Z polarization GMRES(k) and F.G. procedures yield different NI trends in the region between 0 and 0.2 eV. In fact, GMRES(k) requires a larger number of iterations than F.G. version to reach convergence, independently of the dimension of the Krylov subspace k. This is once again due to singularities arising when PRF approaches to 0 eV, which yield ill-conditioning that is exacerbated in the restarted version of the algorithm. Such an explanation is corroborated by the evidence that the number of iterations required to reach convergence decreases by increasing the dimension of the Krylov subspace k. For each k, we also notice that GMRES(k) is almost as efficient as the F.G. procedure in the remaining part of the spectrum. Therefore, in case the PRF is far from 0 eV, the same results can be obtained by using a cheaper iterative procedure in terms of memory requirements, because a smaller number of Krylov basis vectors have to be stored to build up the solution vector. Another quantity that can strongly affect the NI is the threshold T defined in eq. (27), which in all previous calculations has been fixed to 10 −6 . We notice however that T is independent of the size of the system. Therefore, we can expect that the mean precision of the iterative solution, averaged for each charge, is higher when the number of atoms increases. In fact, eq. (27) can be rewritten as: If we assume that the absolute error is the same for each point charge, i.e. R i − ((A − z(ω)I)q k ) i = δ q, and we plug this ap- proximation in eq. (30) we obtain: Therefore, for a given threshold T the absolute error on each charge decreases when the number of atoms N increases. In order to obtain a size-independent estimate of the accuracy of the iterative solution over all the ωFQ charges, we can introduce the following definition of root mean squared error (RMSE): Then, we can define a new convergence criterion as: We can now investigate the accuracy of the iterative solution by adopting different RMSE thresholds. As a precision measure, we consider the longitudinal absorption cross section σ X calculated for a set of GDs (GD20, GD26, GD32, GD36 in table II) applying an electric field with a polarization vector laying on the molecular plane. The linear system has been solved with both GMRES and a direct procedure, i.e. an LU factorization of the coefficient matrix. 51 For each selected RMSE value, the σ X relative error between GMRES and LU factorization averaged over all the considered frequencies (0.0 eV to 2.0 eV, with a step of 0.1 eV) has been calculated, and the results are graphically depicted in fig. 11. An approximate upper bound of the intrinsic precision associated with the factorization algorithm is also plotted (see section S1 in the SM).
It can be seen that the accuracy of the iterative solution for the different systems is almost constant for a specific RMSE value. In particular, by imposing RMSE≤ 10 −4 , the correct order of magnitude obtained by using the LU solution can be recovered by the iterative procedure, i.e. the relative error is ≤ 10 −1 . To further demonstrate that the RMSE criterion is effectively size-independent, we performed the same analysis discussed above also for the CNT case (see Fig. S6 in the SM).
We finally move to discuss the computational time required by the different choices of the RMSE value. In particular, such an analysis has been performed on the four GDs reported in Table II. The computational time required to solve the ωFQ linear system for 20 frequencies in the range between 0.0 and 2.0 eV with a step of 0.1 eV are given in fig. 12 (raw data can be found in Tabs. S2-S5 in the SM). Both direct solution (LU factorization, see section S2 in the SM) and the full GMRES algorithm with different choices of RMSE have been considered. Notice that we are showing the computer time required by the solution of the ωFQ linear system. This means that the coefficient matrix A (eq. (11)) and right-hand side (eq. (13)) construction are not taken into account, in order to allow for a direct comparison with the factorization algorithm. All calculations have been performed on a Xeon Gold 5120 (56 cores, 2.2 GHz) cluster node equipped with 256 GB RAM.
In fig. 12, the average computational time is reported (top, left panel) together with the time required to solve the ωFQ linear system for three selected frequencies, i.e. 0.1 (lower bound), 0.7 and 2.0 eV (upper bound). ω = 0.7 eV has been chosen because it corresponds to the maximum number of iterations required by full GMRES to converge (see also fig. 9). Figure 12 clearly shows that the direct solution through LU factorization and the GMRES iterative procedure intrinsically differ in terms of the scaling with the dimension of the linear system (i.e. the number of atoms). In fact, the former has a complexity of O(N 3 ) while the latter scales as O(N 2 ). Such a difference is highlighted by the computational times in fig. 12: the increase of the computational time for the direct solution (gray line) is larger with respect to the iterative procedure when the number of atoms increases, independently of the RMSE value exploited in the GMRES algorithm. Also, we note that, differently from the LU-based algorithm, the computational time required by the iterative method is not constant across the frequency range. This can be explained by the fact that the number of iterations required to converge to the solution depends on the external frequency (see also fig. 9). Finally, we remark that even at 0.7 eV GMRES is more efficient than the inversion algorithm. In particular, a good compromise between accuracy and computational efficiency can be reached by using RMSE= 10 −5 , for which the computational time of the inversion solution can be reduced by a factor of 10 for the largest studied structure.
The computational time analysis has been performed also for the QMR algorithm (see section S2 and Tabs. S6-S9 in the SM). From an inspection of the numerical results, it emerges that, for a given RMSE, QMR requires roughly twice the computational time than GMRES. In fact, as it has been shown above, GMRES and QMR need a similar number of itera- tions to converge; however, for each iteration GMRES performs a single matrix-vector product, while QMR computes two matrix-vector products (one with the coefficient matrix, and one with the D matrix defined in eq. (3)). Therefore, although the computational and memory costs of GMRES are not fixed during the iterative procedure, it overperforms QMR because of the lower number of matrix-vector products that are needed to build the Krylov subspace.

D. Large systems
To finally demonstrate the robustness of the developed iterative methods to solve the ωFQ linear system, we investigated the plasmonic response of real-size systems, composed of roughly one million atoms. When dealing with large-sized structures, two main issues arise. From the theoretical point of view, the quasi-static approximation on which ωFQ equations are based could be no longer valid. From the technical point of view, when the number of atoms increases the storing of the ωFQ matrix in physical memory can rapidly become unfeasible. Such a problem can be handled by adopting a matrix-free version of the GMRES algorithm, where the A matrix in eq. (14) is not explicitly built. In fact, the iterative algorithm only requires to calculate the matrix-vector product Ax, which can be performed on-the-fly during the execution of the program. This means that at each iterative step k, the new Krylov basis vector is obtained as: where the element (i, j) of the matrix A − z(ω)I is calculated when required by the algorithm. On the other hand, each matrix element has to be calculated from scratch, therefore the on-the-fly version of GMRES would require larger computational time, without affecting the number of iterations. Nevertheless, the on-the-fly matrix-vector product can be efficiently calculated in a parallel environment and memory re-quirements are negligible with respect to the standard GM-RES procedure, because only the iterative vectors should be kept in memory during the solution procedure.
To showcase the performance of GMRES when applied to large systems, we have selected three structures composed by roughly 1 million atoms: a carbon nanotube -CNT1M-, a graphene disk -GD1M-and a sodium nanorod -NR1M-(see Tab. S10 given in the SM, for geometrical parameters). The latter is genuinely different from the other two structures and has been selected to further demonstrate the reliability of the method to study the optical properties of metal nanostructures.
For each of the constructed structures, the longitudinal absorption cross sections and the NI have been calculated. We have set RMSE to 10 −5 , which has been proved in the previous section to be a good compromise between accuracy and the computational cost.
CNT1M. The calculations on CNT1M have been performed by applying an external field along the transversal and longitudinal directions, at 35 different frequencies in the range between 0.0 and 0.45 eV, by setting τ = 170 a.u. and ε F = 1.03 eV. The longitudinal absorption cross sections and the NI are reported in fig. 13. The transversal PRF is placed at about 0.38 eV, which is close to the value for CNT4, which has the same diameter (see table I). The longitudinal PRF is instead placed at 0.02 eV, which is smaller than the value for CNT300, which has a length of 300 nm. This is not surprising because the PRF is red-shifted as the length of the system increases. The required number of iterations as a function of the external field frequency is reported in fig. 13. We note that the maximum number of iterations is 80, which is lower than what we have obtained for CNT300 (see section IV B). This is due to the larger convergence threshold chosen for the iterative procedure; overall, a mean value of about 30 iterations is sufficient to reach the convergence. Since the number of iterations is modest, restarted GMRES has not been considered for such large systems. Moving on to discuss the computational time, our implementation permits to calculate about 4 matrixvector product per hour, thus resulting in a total time of about 487 hours.
GD1M. The ωFQ linear system has been solved for GD1M with an external field polarization vector laying on the GD plane (X), at 15 different frequencies in the range between 0.05 and 0.19 eV with a constant step of 0.01 eV. We set τ = 170 a.u. and ε F = 1.84 eV. The computed σ X and NI are reported in fig. 14. The transversal dipolar PRF for this system lays at 0.12 eV, which is smaller than the values for the smallest GDs studied in the previous sections. The required number of iterations to reach convergence is about 30 for each external field frequency, i.e. smaller than what is required by GDs described in section IV B. As it has been stated for CNT, this is mainly due to the setting of a larger RMSE threshold.
NR1M. Finally, we have calculated the plasmonic response of the NR1M system. This is a 3D nanostructure, therefore a general expression of the K tot matrix in terms of the 3D electron density n 0 (see eq. (5)) needs to be exploited. All calculations have been performed with ωFQ parameters for sodium reported in a previous work. 27 The linear system in eq. (14) has been solved for 24 frequencies in the range be- tween 0.9 and 1.8 eV (unevenly distributed) with an external field aligned along the longitudinal (Z) direction. As for CNT1M and GD1M, the RMSE threshold was set to 10 −5 .
The computed σ Z and the corresponding NI are reported in fig. 15. The longitudinal PRF is placed at about 1.43 eV, which is blue shifted with respect to the longitudinal PRF of CNT1M, due to the fact that the electronic properties of the two materials are different. As for the previously studied carbonbased system, the value of the longitudinal PRF is red-shifted with respect to smaller sodium nanorods reported in a previous work. 27 This is once again due to the lighting rod effect discussed above. Similarly to previous cases, a mean value of about 30 iterations is sufficient to reach convergence.

V. SUMMARY AND CONCLUSIONS
In this paper, we have discussed how to substantially increase the applicability of classical, fully atomistic approaches to the calculation of the plasmonic properties of real-size nanostructures (carbon nanotubes, graphene-based materials and metal nanoparticles). State-of-the-art numerical iterative Krylov-based techniques, GMRES and QMR, have been specified to the recently developed fully atomistic ωFQ model. For the studied structures, we have highlighted the dependence of the convergence rate on physical properties and parameters entering the definition of the model. It is worth remarking that a correlation between local maxima of the number of iterations and plasmon resonance frequencies is observed, showing that the convergence rate is particularly slowed down for high-order plasmons. Finally, on-the-fly implementation of GMRES has allowed the calculation of plasmon resonances of very large realistic nanostructures, composed of roughly one million atoms, which are not affordable by direct algorithms. We have also demonstrated the reliability of our implementation by showing that the required number of iterations to correctly describe the plasmon response of such large structures, is moderate.
The implemented iterative procedures are characterized by three main bottlenecks. On one hand, the number of matrixvector products needed to build approximate solutions to the linear system is associated with a computational complexity of O(N 2 ). Linear scaling in matrix-vector products might be achieved through the fast multipole method (FMM) 62-65 , a numerical technique that can be adopted to build an approximation to the long-range electrostatic forces, which has already been applied to plasmonic substrates. 15 Such an extension will allow to afford systems even larger than those studied in this work. However, in this case the quasi-static approximation on which ωFQ relies, may be no longer valid. Therefore, retardation effects would need to be included in the model, similarly to what has already been proposed for continuum approaches. 25,66 A second bottleneck in our procedure is the lack of any preconditioning of the ωFQ linear system. Different approaches may be exploited to solve dense complex linear systems, and they will be tested in future works for ωFQ. [67][68][69] Finally, in order to study the plasmonic properties of a given nanostructure along a specific spectral region, the ωFQ linear system can be solved independently for each frequency. However, a change in frequency only affects the uniform diagonal shift of the coefficient matrix in eq. (14). Therefore, the shift-invariance property of the Krylov subspaces may be exploited by resorting to the so-called subspace recycling techniques. 70,71 To conclude, another valuable application of ωFQ is its potential applicability to theoretically describe Surface-Enhanced spectroscopies, either based on graphene-based substrates or metal nanoparticles. 60,72 To this end, ωFQ needs to be coupled with a quantum Hamiltonian describing the adsorbed moiety, in a QM/MM fashion. [12][13][14][15][16] Such a further development is on-going in our group and will be the topic of future communications.

VI. ACKNOWLEDGMENT
This work has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 818064).

Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.