Discontinuous Galerkin methods for nonlinear scalar hyperbolic conservation laws: divided difference estimates and accuracy enhancement

In this paper, an analysis of the accuracy-enhancement for the discontinuous Galerkin (DG) method applied to one-dimensional scalar nonlinear hyperbolic conservation laws is carried out. This requires analyzing the divided difference of the errors for the DG solution. We therefore first prove that the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}α-th order \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1 \le \alpha \le {k+1})$$\end{document}(1≤α≤k+1) divided difference of the DG error in the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L^2$$\end{document}L2 norm is of order \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${k + \frac{3}{2} - \frac{\alpha }{2}}$$\end{document}k+32-α2 when upwind fluxes are used, under the condition that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|f'(u)|$$\end{document}|f′(u)| possesses a uniform positive lower bound. By the duality argument, we then derive superconvergence results of order \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${2k + \frac{3}{2} - \frac{\alpha }{2}}$$\end{document}2k+32-α2 in the negative-order norm, demonstrating that it is possible to extend the Smoothness-Increasing Accuracy-Conserving filter to nonlinear conservation laws to obtain at least \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\frac{3}{2}k+1})$$\end{document}(32k+1)th order superconvergence for post-processed solutions. As a by-product, for variable coefficient hyperbolic equations, we provide an explicit proof for optimal convergence results of order \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${k+1}$$\end{document}k+1 in the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L^2$$\end{document}L2 norm for the divided differences of DG errors and thus \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({2k+1})$$\end{document}(2k+1)th order superconvergence in negative-order norm holds. Numerical experiments are given that confirm the theoretical results.


Introduction
In this paper, we study the accuracy-enhancement of semi-discrete discontinuous Galerkin (DG) methods for solving one-dimensional scalar conservation laws (1.1a) u(x, 0) = u 0 (x), x ∈ = (a, b), (1.1b) where u 0 (x) is a given smooth function. We assume that the nonlinear flux function f (u) is sufficiently smooth with respect to the variable u and the exact solution is also smooth. For the sake of simplicity and ease in presentation, we only consider periodic boundary conditions. We show that the α-th order (1 ≤ α ≤ k + 1) divided difference of the DG error in the L 2 norm achieves (k + 3 2 − α 2 )th order when upwind fluxes are used, under the condition that | f (u)| possesses a uniform positive lower bound. By using a duality argument, we then derive superconvergence results of order 2k + 3 2 − α 2 in the negative-order norm. This allows us to demonstrate that it is possible to extend the post-processing technique to nonlinear conservation laws to obtain at least ( 3 2 k + 1)th order of accuracy. In addition, for variable coefficient hyperbolic equations that have been discussed in [19], we provide an explicit proof for optimal error estimates of order k + 1 in the L 2 norm for the divided differences of the DG errors and thus 2k + 1 in the negative-order norm.
Various superconvergence properties of DG methods have been studied in the past two decades, which not only provide a deeper understanding about DG solutions but are useful for practitioners. According to different measurements of the error, the superconvergence of DG methods is mainly divided into three categories. The first one is superconvergence of the DG error at Radau points, which is typically measured in the discrete L 2 norm and is useful to resolve waves. The second one is superconvergence of the DG solution towards a particular projection of the exact solution (supercloseness) measured in the standard L 2 norm, which lays a firm theoretical foundation for the excellent behaviour of DG methods for long-time simulations as well as adaptive computations. The last one is the superconvergence of post-processed solution by establishing negative-order norm error estimates, which enables us to obtain a higher order approximation by simply post-processing the DG solution with a specially designed kernel at the very end of the computation. In what follows, we shall review some superconvergence results for the aforementioned three properties and restrict ourselves to hyperbolic equations to save space. For superconvergence of DG methods for other types of PDEs, we refer to [21].
Let us briefly mention some superconvergence results related to the Radau points and supercloseness of DG methods for hyperbolic equations. Adjerid and Baccouch [1][2][3] studied the superconvergence property as well as the a posteriori error estimates of the DG methods for one-and two-dimensional linear steady-state hyperbolic equations, in which superconvergence of order k + 2 and 2k + 1 are obtained at downwind-biased Radau points and downwind points, respectively. Here and below, k is the highest polynomial degree of the discontinuous finite element space. For timedependent linear hyperbolic equations, Cheng and Shu [9] investigated supercloseness for linear hyperbolic equations, and they obtained superconvergence of order k + 3 2 towards a particular projection of the exact solution, by virtue of construction and analysis of the so-called generalized slopes. Later, by using a duality argument, Yang and Shu [24] proved superconvergence results of order k + 2 of the DG error at downwind-biased points as well as cell averages, which imply a sharp (k + 2)th order supercloseness result. By constructing a special correction function and choosing a suitable initial discretization, Cao et al. [7] established a supercloseness result towards a newly designed interpolation function. Further, based on this supercloseness result, for the DG error they proved the (2k + 1)th order superconvergence at the downwind points as well as domain average, (k + 2)-th order superconvergence at the downwind-biased Radau points, and superconvergent rate k + 1 for the derivative at interior Radau points. We would like to remark that the work of [7,24] somewhat indicates the possible link between supercloseness and superconvergence at Radau points. For time-dependent nonlinear hyperbolic equations, Meng et al. [18] proved a supercloseness result of order k + 3 2 . For superconvergent posteriori error estimates of spatial derivative of DG error for nonlinear hyperbolic equations, see [4].
Let us now mention in particular some superconvergence results of DG methods regarding negative-order norm estimates and post-processing for hyperbolic equations. The basic idea of post-processing is to convolve the numerical solution by a local averaging operator with the goal of obtaining a better approximation and typically of a higher order. Motivated by the work of Bramble and Schatz in the framework of continuous Galerkin methods for elliptic equations [5], Cockburn et al. [11] established the theory of post-processing techniques for DG methods for hyperbolic equations by the aid of negative-order norm estimates. The extension of this post-processing technique was later fully studied by Ryan et al. in different aspects of problems, e.g. for general boundary condition [20], for nonuniform meshes [13] and for applications in improving the visualization of streamlines [22] in which it is labeled as a Smoothness-Increasing Accuracy-Conserving (SIAC) filter. For the extension of the SIAC filter to linear convection-diffusion equations, see [15].
By the post-processing theory [5,11], it is well known that negative-order norm estimates of divided differences of the DG error are important tools to derive superconvergent error estimates of the post-processed solution in the L 2 norm. Note that for purely linear equations [11,15], once negative-order norm estimates of the DG error itself are obtained, they trivially imply negative-order norm estimates for the divided differences of the DG error. However, the above framework is no longer valid for variable coefficient or nonlinear equations. In this case, in order to derive superconvergent estimates about the post-processed solution, both the L 2 norm and negative-order norm error estimates of divided differences should be established. In particular, for variable coefficient hyperbolic equations, although negative-order norm error estimates of divided differences are given in [19], the corresponding L 2 norm estimates are not provided. For nonlinear hyperbolic conservation laws, Ji et al. [16] showed negative-order norm estimates for the DG error itself, leaving the estimates of divided differences for future work.
For nonlinear hyperbolic equations under consideration in this paper, it is therefore important and interesting to address the above issues by establishing both the L 2 norm and negative-order norm error estimates for the divided differences. The major part of this paper is to show L 2 norm error estimates for divided differences, which are helpful for us to obtain a higher order of accuracy in the negative-order norm and thus the superconvergence of the post-processed solutions. We remark that it requires | f (u)| having a uniform positive lower bound due to the technicality of the proof. The generalization from purely linear problems [11,15] to nonlinear hyperbolic equations in this paper involves several technical difficulties. One of these is how to establish important relations between the spatial derivatives and time derivatives of a particular projection of divided differences of DG errors. Even if the spatial derivatives of the error are switched to their time derivatives, it is still difficult to analyze the time derivatives of the error; for more details, see Sect. 3.2 and also the appendix. Another important technicality is how to construct a suitable dual problem for the divided difference of the nonlinear hyperbolic equations. However, it seems that it is not trivial for the two-dimensional extension, especially for establishing the relations between spatial derivatives and time derivatives of the errors. The main tool employed in deriving L 2 norm error estimates for the divided differences is an energy analysis. To deal with the nonlinearity of the flux functions, Taylor expansion is used following a standard technique in error estimates for nonlinear problems [25]. We would like to remark that the superconvergence analysis in this paper indicates a possible link between supercloseness and negative-order norm estimates.
This paper is organized as follows. In Sect. 2, we give the DG scheme for divided differences of nonlinear hyperbolic equations, and present some preliminaries about the discontinuous finite element space. In Sect. 3, we state and discuss the L 2 norm error estimates for divided differences of nonlinear hyperbolic equations, and then display the main proofs followed by discussion of variable coefficient hyperbolic equations. Section 4 is devoted to the accuracy-enhancement superconvergence analysis based on negative-order norm error estimates of divided differences. In Sect. 5, numerical experiments are shown to demonstrate the theoretical results. Concluding remarks and comments on future work are given in Sect. 6. Finally, in the appendix we provide the proofs for some of the more technical lemmas.

The DG scheme
In this section, we follow [11,12] and present the DG scheme for divided differences of the problem (1.1). Let )/2. Since we are focused on error analysis of both the L 2 norm and the negative-order norm for divided differences of the DG solution and the problem under consideration is assumed to be periodic, we shall introduce two overlapping uniform (translation invariant) meshes for , namely I j = (x j− 1 2 , x j+ 1 2 ) and I j+ 1 . Associated with these meshes, we define the discontinuous finite element space where P k (I j ) denotes the set of polynomials of degree up to k defined on the cell ). Here and below, α represents the α-th order divided difference of a given function, whose definition is given in (2.6a). Clearly, V α h is a piecewise polynomial space on mesh I j = I j for even α (including α = 0) and a piecewise polynomial space on mesh I j = I j+ 1 2 for odd α of the DG scheme. For simplicity, for even α, we would like to use V h to denote the standard finite element space of degree k defined on the cell I j , if there is no confusion. Since functions in V α h may have discontinuities across element interfaces, we denote by w − i and w + i the values of w(x) at the discontinuity point x i from the left cell and the right cell, respectively. Moreover, we use [[w]] = w + − w − and {{w}} = 1 2 (w + + w − ) to represent the jump and the mean of w(x) at each element boundary point.
The α-th order divided difference of the nonlinear hyperbolic conservation law is Clearly, (2.1) reduces to (1.1) when α = 0. Then the approximation of the semi-discrete DG method for solving (2.1) becomes: find the unique function holds for all v h ∈ V α h and all j = 1, . . . , N . Note that, by (2.6a), for any x ∈ I j and t, ∂ α h u h (x, t) is a linear combination of the values of u h at α + 1 equally spaced points of length h, namely x − α 2 h, . . . , x + α 2 h. Here and in what follows, (·, ·) j denotes the usual inner product in L 2 (I j ), and H j (·, ·) is the DG spatial discretization operator defined on each cell We point out that in order to obtain a useful bound for the L 2 norm error estimates of divided differences, the numerical fluxf j+ 1 2 is chosen to be an upwind flux, for example, the well-known Godunov flux. Moreover, the analysis requires a condition that | f (u)| has a uniform positive lower bound. Without loss of generality, throughout the paper, we only consider f (u) ≥ δ > 0, and thusŵ = w − . Therefore, (2.3b) For periodic boundary conditions under consideration in this paper, we use H to denote the summation of H j with respect to cell I j , that is where (w, v) = N j=1 (w, v) j represents the inner product in L 2 ( α ). Note that we have used the summation with respect to j instead of j to distinguish two overlapping meshes, since j = j for even α and j = j + 1 2 for odd α.

Preliminaries
We will adopt the following convention for different constants. We denote by C a positive constant independent of h but may depend on the exact solution of the Eq. (2.1), which could have a different value in each occurrence. To emphasize the nonlinearity of the flux f (u), we employ C to denote a nonnegative constant depending solely on the maximum of a high order derivative | f m | (m ≥ 2). We remark that C = 0 for a linear flux function where c is a constant and a(x) is a given smooth function. Prior to analyzing the L 2 norm and the negative-order norm error estimates of divided differences, let us present some notation, definitions, properties of DG discretization operator, and basic properties about SIAC filters. These preliminary results will be used later in the proof of superconvergence property.

Sobolev spaces and norms
We adopt standard notation for Sobolev spaces. For any integer s ≥ 0, we denote by W s, p (D) the Sobolev space on subdomain D ⊂ equipped with the norm · s, p,D . In particular, if p = 2, we set W s, p (D) = H s (D), and · s, p,D = · s,D , and further if s = 0, we set · s,D = · D . Throughout the paper, when D = , we will omit the index D for convenience. Furthermore, the norms of the broken Sobolev spaces T 0 · s,D dt. Additionally, we denote by · h the standard L 2 norm on the cell interfaces of the mesh I j . Specifically, for the one-dimensional case under consideration in this To simplify notation in our later analysis, following [23], we would like to introduce the so-called In the post-processing framework, it is useful to consider the negative-order norm, defined as: Given > 0 and domain ,

Properties for divided differences
For any function w and integer γ , the following standard notation of central divided difference is used (2.6a) Note that the above notation is still valid even if w is a piecewise function with possible discontinuities at cell interfaces. In later analysis, we will repeatedly use the properties about divided differences, which are given as follows. For any functions w and v which is the so-called Leibniz rule for the divided difference. Moreover, for sufficiently smooth functions w(x), by using Taylor expansion with integral form of the remainder, we can easily verify that ∂ γ h w is a second order approximation to ∂ γ x w, namely where C γ is a positive constant and ψ γ is a smooth function; for example, C γ = 1/8, 1, 3/32 for γ = 1, 2, 3, and Here and below, ∂ γ x (·) denotes the γ -th order partial derivative of a function with respect to the variable x; likewise for ∂ γ t (·). The last identity is the so-called summation by parts, namely (2.6d) In addition to the properties of divided differences for a single function w(x), the properties of divided differences for a composition of two or more functions are also needed. However, we would like to emphasize that properties (2.6a), (2.6b), (2.6d) are always valid whether w is a single function or w is a composition of two or more functions. As an extension from the single function case in (2.6c) to the composite function case, the following property (2.6e) is subtle, it requires a more delicate argument for composite functions. Without loss of generality, if w is the composition of two smooth functions r and u, i.e., w(x) := r (u(x)), we can prove the following identity where C γ is a positive constant and γ is a smooth function. We can see that, unlike (2.6c), the divided difference of a composite function is a first order approximation to its derivative of the same order. This finding, however, is sufficient in our analysis; see Corollary 1.
It is worth pointing out that in (2.6e), ∂ γ x r (u(x)) and ∂ γ h r (u(x)) should be understood in the sense of the chain rule for high order derivatives and divided differences of composite functions, respectively. In what follows, we use f [x 0 , . . . , x γ ] to denote the standard γ -th order Newton divided difference, that is It is easy to verify that where . For completeness, we shall list the chain rule for the derivatives (the well-known Faà di Bruno's Formula) and also for the divided differences [14]; it reads where u i = u(x i ), and the sum is over all = 1, . . . , γ and non-negative integer and with the sum being over integers α 1 , . . . , α −1 such that ≤ α 1 ≤ · · · ≤ α −1 ≤ γ .
It follows from the mean value theorem for divided differences that Consequently, by (2.7), We are now ready to prove (2.6e) for the relation between the divided difference and the derivative of composite functions. Using a similar argument as that in the proof of (2.6c) for A ,γ u and the relation that due to the smoothness of u i and the fact that u i may not necessarily be equally spaced, with u γ 2 = u(x) and ψ(u 0 , u 1 , . . . , u γ ) being smooth functions, we can obtain the relation (2.6e).

The inverse and projection properties
Now we list some inverse properties of the finite element space V α h . For any p ∈ V α h , there exists a positive constant C independent of p and h, such that Next, we introduce the standard L 2 projection of a function q ∈ L 2 ( ) into the finite element space V k h , denoted by P k q, which is a unique function in V k h satisfying Note that the proof of accuracy-enhancement of DG solutions for linear equations relies only on an L 2 projection of the initial condition [11,15]. However, for variable coefficient and nonlinear hyperbolic equations, a suitable choice of the initial condition and a superconvergence relation between the spatial derivative and time derivative of a particular projection of the error should be established, since both the L 2 norm and negative-order norm error estimates of divided differences need to be analyzed. In what follows, we recall two kinds of Gauss-Radau projections P ± h into V h following a standard technique in DG analysis [8,25]. For any given function q ∈ H 1 ( h ) and an arbitrary element , the special Gauss-Radau projection of q, denoted by P ± h q, is the unique function in V k h satisfying, for each j , We would like to remark that the exact collocation at one of the end points on each cell plus the orthogonality property for polynomials of degree up to k − 1 of the Gauss-Radau projections P ± h play an important role and are used repeatedly in the proof.
h ) the projection error, then by a standard scaling argument [6,10], it is easy to obtain, for smooth enough q(x), that,

The properties of the DG spatial discretization
To perform the L 2 error estimates of divided differences, several properties of the DG operator H are helpful, which are used repeatedly in our proof; see Sect. 3.
Proof Let us first prove (2.11b), which is a straightforward consequence of the definition of H, since, after a simple integration by parts We would like to emphasize that (2.11b) is still valid even if the smooth function r and w ∈ V α h depend on different x, e.g.
x, x + h 2 etc, as only integration by parts as well as the boundedness of r is used here.
To prove (2.11a), we consider the equivalent strong form of H (2.4b). An application of Cauchy-Schwarz inequality and inverse inequality (ii) leads to This completes the proof of Lemma 1.

Corollary 1
Under the same conditions as in Lemma 1, we have, for small enough h, Proof The case α = 0 has been proved in Lemma 1. For general α ≥ 1, let us start by using the relation (2.6e) for ∂ α h r to obtain with C a positive constant and α a smooth function. Next, applying (2.11a) in Lemma This finishes the proof of Corollary 1.

0, and thus
Next, on each cell Clearly, on each element I j , |r (u) − r (u j )| ≤ C h due to the smoothness of r and u. Using the orthogonality property of P − h again (2.9b), we arrive at where we have used Cauchy-Schwarz inequality, inverse inequality (i) and the approximation property (2.10a) consecutively.

Corollary 2 Suppose that r (u(x, t)) is smooth with respect to each variable. Then, for any
Proof The case α = 0 has been proved in Lemma 2. For α ≥ 1, by the Leibniz rule (2.6b) and taking into account the fact that both the divided difference operator ∂ h and the projection operator Thus, Clearly, by (2.6e),ř is also a smooth function with respect to each variable with leading term ∂ x r x + α− 2 h . To complete the proof, we need only apply the same procedure as that in the proof of Lemma 2 to each H term on the right side of (2.15).

Regularity for the variable coefficient hyperbolic equations
Since the dual problem for the nonlinear hyperbolic equation is a variable coefficient equation, we need to recall a regularity result.

Lemma 3 [16] Consider the variable coefficient hyperbolic equation with a periodic boundary condition for all t
where a(x, t) is a given smooth periodic function. For any ≥ 0, fix time t and a(x, t) ∈ L ∞ ([0, T ]; W 2 +1,∞ ( )), then the solution of (2.16) satisfies the following regularity property where C is a constant depending on a L ∞ ([0,T ];W 2 +1,∞ ( )) .

SIAC filters
The SIAC filters are used to extract the hidden accuracy of DG methods, by means of a post-processing technique, which enhances the accuracy and reduces oscillations of the DG errors. The post-processing is a convolution with a kernel function K ν,k+1 h that is of compact support and is a linear combination of B-splines, scaled by the uniform mesh size, where ψ (k+1) is the B-spline of order k + 1 obtained by convolving the characteristic function ψ (1) = χ of the interval (−1/2, 1/2) with itself k times. Additionally, the kernel function K ν,k+1 h should reproduce polynomials of degree ν − 1 by convolution, which is used to determine the weights c ν,k+1 γ . For more details, see [11]. The post-processing theory of SIAC filters is given in the following theorem.

Theorem 1 (Bramble and Schatz
and U be any approximation to u, then where C 1 and C 2 depend on 0 , k, but is independent of h.

L 2 norm error estimates for divided differences
By the post-processing theory [5,11] (also see Theorem 1), it is essential to derive negative-order norm error estimates for divided differences, which depend heavily on their L 2 norm estimates. However, for both variable coefficient equations and nonlinear equations, it is highly nontrivial to derive L 2 norm error estimates for divided differences, and the technique used to prove convergence results for the DG error itself needs to be significantly changed.

The main results in L 2 norm
Let us begin by denoting e = u − u h to be the error between the exact solution and numerical solution. Next, we split it into two parts; one is the projection error, denoted by η = u − Q h u, and the other is the projection of the error, denoted by Here the projection Q h is defined at each time level t corresponding to the sign variation of f (u); specifically, for any t ∈ [0, T ] and We are now ready to state the main theorem for the L 2 norm error estimates.
where the positive constant C depends on the u, δ, T and f , but is independent of h.

Corollary 3
Under the same conditions as in Theorem 2, if in addition α ≥ 1 we have the following error estimates: by the approximation error estimate (2.10a). Now, the error estimate (3.2) follows by combining the triangle inequality and (3.1).
Remark 1 Clearly, the L 2 error estimates for the divided differences in Theorem 2 and Corollary 3 also hold for the variable coefficient equation (2.1) with f (u) = a(x)u and |a(x)| ≥ δ > 0. In fact, for variable coefficient equations, we can obtain optimal (k + 1)th order in the L 2 norm and thus (2k + 1)th order in the negative-order norm; see Sect. 3.3.

Remark 2
The result with α = 0 in Theorem 2 is indeed a superconvergence result towards a particular projection of the exact solution (supercloseness) that has been established in [18], which is a starting point for proving ∂ α h ξ with α ≥ 1. For completeness, we list the superconvergence result for ξ (zeroth order divided difference) as follows where, on each element I j , we have used being a constant and S(x) ∈ P k−1 (I j ). Note that the proof of such superconvergence results requires that | f (u)| is uniformly lower bounded by a positive constant; for more details, see [18].
In the proof of Theorem 2, we have also obtained a generalized version about the L 2 norm estimates of ξ in terms of the divided differences, their time derivatives, and spatial derivatives. To simplify notation, for an arbitrary multi-index β = (β 1 , β 2 ), we denote by ∂ β M (·) the mixed operator containing divided differences and time derivatives of a given function, namely Corollary 4 Under the same conditions as in Theorem 2, for β 0 = 0, 1 and a multi- we have the following unified error estimate: where |β | = β 0 + |β|.

Proof of the main results in the L 2 norm
Similar to the discussion of the DG discretization operator properties in Sect. 2.2.4, without loss of generality, we will only consider the case on each cell interface and choose the projection as Q h = P − h on each cell, and the initial condition is chosen as Since the case α = 0 has already been proven in [18] (see (3.4a)), we need only to consider 1 ≤ α ≤ k + 1. In order to clearly display the main ideas of how to perform the L 2 norm error estimates for divided differences, in the following two sections we present the detailed proof for Theorem 2 with α = 1 and α = 2, respectively; the general cases with 3 ≤ α ≤ k + 1 (k ≥ 2) can be proven by induction, which are omitted to save space.

Analysis for the first order divided difference
For α = 1, the DG scheme (2.2) becomes By Galerkin orthogonality and summing over all j , we have the error equation To simplify notation, we would like to denote ∂ h e :=ē =η +ξ with η = ∂ h η,ξ = ∂ h ξ . If we now take v h =ξ , we get the following identity The estimate for the right side of (3.7) is complicated, since it contains some integral terms involving mixed order divided differences of ξ , namely ξ andξ , which is given in the following lemma.

Lemma 4 Suppose that the conditions in Theorem 2 hold. Then we have
where the positive constants C and C are independent of h and u h .
Proof Let us start by using the second order Taylor expansion with respect to the variable u to write out the nonlinear terms, namely )dν are the integral form of the remainders of the second order Taylor expansion. We would like to emphasize that the various order spatial derivatives, time derivatives and divided differences of R 1 are all bounded uniformly due to the smoothness of f and u. Thus, which will be estimated separately below.
To estimate J , we employ the Leibniz rule (2.6b), and rewrite and thus, where we have omitted the dependence of x for convenience if there is no confusion, since the proof of (2.11b) is still valid even if f (u) andξ are evaluated at different x; see proof of (2.11b) in Sect. 2.2.4. A direct application of Lemma 1 together with the assumption that f (u) ≥ δ > 0, (2.11b), leads to the estimate for J 1 : By Corollary 1, we arrive at the estimate for J 2 : Substituting (3.4a)-(3.4c) into (3.10b), and combining with (3.10a), we have, after a straightforward application of Young's inequality, that Let us now move on to the estimate of K. By Corollary 2, we have To estimate L, let us first employ the identity (2.6b) and rewrite ∂ h (R 1 e 2 ) as Consequently, It is easy to show, for the high order nonlinear term H (D 1 ,ξ), that where in the first step we have used the Cauchy-Schwarz inequality, in the second step we have used the inverse properties (i) and (ii), and in the last step we have employed the interpolation properties (3.3). We see that in order to deal with the nonlinearity of f we still need to have a bound for e ∞ . Due to the superconvergence result (3.4a), we conclude, by combining inverse inequality (iii) and the approximation property (2.10b), that e ∞ ≤ Ch k+1 . (3.14) Therefore, for small enough h, we have By using analysis similar to that in the proof of (3.13), we have, for H(D 2 ,ξ), that As a consequence, by (3.14) and (3.4a) A combination of (3.15a) and (3.15b) produces a bound for L: To complete the proof of Lemma 4, we need only combine (3.11), (3.12), (3.16) and use Young's inequality.
We are now ready to derive the L 2 norm estimate forξ . To do this, let us begin by inserting the estimate (3.8) into (3.7) and taking into account the bound forη in (3.3) and thusη t to get, after an application of Cauchy-Schwarz inequality and Young's inequality, that Next, we integrate the above inequality with respect to time between 0 and T and note the fact thatξ(0) = 0 due to ξ(0) = 0 to obtain where we have used the superconvergence result (3.4a). An application of Gronwall's inequality leads to the desired result This finishes the proof of Theorem 2 for α = 1.

Remark 3
We can see that the estimates (3.17) for the L 2 norm and the jump seminorm ofξ are based on the corresponding results for ξ in Remark 2, which are half an order lower than that of ξ . This is mainly due to the hybrid of different order divided differences of ξ , namely ξ andξ , and thus the application of inverse property (ii). It is natural that the proof for the high order divided difference of ξ , say ∂ 2 h ξ , should be based on the corresponding lower order divided difference results of ξ (ξ andξ ) that have already been established; see Sect. 3.2.2 below.

Analysis for the second order divided difference
For α = 2, the DG scheme (2.2) becomes with j = j, which holds for any v h ∈ V α h and j = 1, . . . , N . By Galerkin orthogonality and summing over all j, we have the error equation To simplify notation, we would like to denote ∂ 2 h e :=ẽ =η +ξ with If we now take v h =ξ , we get the following identity The estimate for right side of (3.19) is rather complicated, since it contains some integral terms involving mixed order divided differences of ξ , namely ξ ,ξ andξ , which is given in the following Proposition.

Proposition 1 Suppose that the conditions in Theorem 2 hold. Then we have
where the positive constants C and C are independent of h and u h .
Proof By the second order Taylor expansion (3.9), we have which will be estimated one by one below.
To estimate P, we use the Leibniz rule (2.6b), to rewrite and thus, where we have omitted the dependence of x for convenience if there is no confusion. A direct application of Lemma 1 together with the assumption that f (u) ≥ δ > 0, (2.11b), produces the estimate for P 1 : By Corollary 1, we arrive at the estimates for P 2 and P 3 : Substituting (3.4a)-(3.4c), (3.17) into (3.22b), (3.22c), and combining with (3.22a), we have, after a straightforward application of Young's inequality, that For terms on the right side of (3.23), although we have information about |[ξ ]| 2 and |[ξ ]| 2 as shown in (3.4a) and (3.17), we still need a suitable bound for ξ x , which is given in the following lemma.

24)
where C depends on u and δ but is independent of h and u h .
The proof of this lemma is given in the appendix. Up to now, we see that we still need to have a bound for ξ t . In fact, the proof for ξ t would require additional bounds for (ξ t ) x and ξ tt , whose results are shown in Lemmas 6 and 7.

Lemma 6
Suppose that the conditions in Theorem 2 hold. Then we have The proof of Lemma 6 follows along a similar argument as that in the proof of Lemma 5, so we omit the details here.

Lemma 7
Suppose that the conditions in Theorem 2 hold. Then we have The proof of this lemma is deferred to the appendix. Based on the above two lemmas, we are able to prove the bound for ξ t in Lemma 8, whose proof is deferred to the appendix.

Lemma 8 Suppose that the conditions in Theorem 2 hold. Then we have
where C depends on u and δ but is independent of h and u h .
We now collect the estimates in Lemmas 5 and 8 into (3.23) to get Let us now move on to the estimate of Q. By Corollary 2, we have To estimate S, let us first employ the identity (2.6b) and rewrite ∂ 2 h (R 1 e 2 ) as Thus, By using analysis similar to that in the proof of (3.13), we get where we have used the fact that for k ≥ 1 and small enough h, C h −1 ( e ∞ + ē ∞ ) ≤ C; for more details, see the appendix. Consequently This finishes the proof of Proposition 1.
We are now ready to derive the L 2 norm estimate forξ . To do this, we begin by combining (3.19) and (3.20) to get Next, integrate the above inequality with respect to time between 0 and T and use ξ(0) = 0 (and thusξ(0) = ∂ 2 h ξ(0) = 0) to obtain by the estimates (3.4a) and (3.17). An application of Gronwall's inequality leads to the desired result This completes the proof of Theorem 2 with α = 2.

Remark 4
Through the proof of Theorem 2 with α = 2, ξ , we can see that apart from the bounds for ξ , ξ x , ξ t that have already been obtained for proving ξ , we require additional bounds for ξ x , ξ t , (ξ t ) x , and ξ tt , as shown in Lemmas 5-8. The proof for the L 2 norm estimates for higher order divided differences are more technical and complicated, and it would require bounds regarding lower order divided differences as well as its corresponding spatial and time derivatives. For example, when α = 3, in addition to the abounds aforementioned, we need to establish the bounds for ξ x , ξ t , (ξ t ) x , ξ tt , (ξ tt ) x and ξ ttt . Thus, Theorem 2 can be proven along the same lines for general α ≤ k + 1. Finally, we would like to point out that the corresponding results on the jump seminorm for various order divided differences and time derivatives of ξ are useful, which play an important role in deriving Theorem 2.

The main results
In this section we consider the L 2 error estimates for divided differences for the variable coefficient equation (1.1) with f (u) = a(x)u. Similar to the nonlinear hyperbolic case, to obtain a suitable bound for the L 2 norm the numerical flux should be chosen as an upwind flux. Moreover, the analysis requires a condition that |a(x)| is uniformly lower bounded by a positive constant. Without loss of generality, we only consider a(x) ≥ δ > 0, and thus the DG scheme is for v h ∈ V α h . We will use the same notation as before. For nonlinear hyperbolic equations, the loss of order in Theorem 2 is mainly due to the lack of control for the interface jump terms arising from (2.11a) in the superconvergence relation, for example, (3.4b), (3.24) and (3.25). Fortunately, for variable coefficient hyperbolic equations, we can establish a stronger superconvergence relation between the spatial derivative as well as interface jumps of the various order divided difference of ξ and its time derivatives; see (3.37b) below. Thus, optimal L 2 error estimates of order k + 1 are obtained.
Prior to stating our main theorem, we would like to present convergence results for time derivatives of ξ , which is slightly different to those for nonlinear hyperbolic equations. where the positive constant C depends on u, T and a, but is independent of h.

Lemma 9 Let u be the exact solution of the variable coefficient hyperbolic Eq. (1.1) with f (u) = a(x)u, which is assumed to be sufficiently smooth with bounded derivatives. Let u h be the numerical solution of scheme
The proof of this lemma is postponed to the appendix.
We are now ready to state our main theorem. For a uniform mesh of = (a, b), if the finite element space V α h of piecewise polynomials with arbitrary degree k ≥ 0 is used, then for any T > 0 there holds the following error estimate

34)
where the positive constant C depends on u, δ, T and a, but is independent of h.
Remark 5 Based on the optimal error estimates for ∂ α h ξ together with approximation error estimates (3.3) and using the duality argument in [19], we can obtain the negativeorder norm estimates For more details, see [5,19] and also Sect. 4 below.

Proof of main results
We shall prove Theorem 3 for general α ≥ 1. First we claim that if we can prove the following three inequalities t ξ represents the mixed operator containing divided differences and time derivatives of ξ that has already been defined in (3.5), then ∂ α h ξ ≤ Ch k+1 . In what follows, we sketch the verification of this claim. To do that, we start by taking v h = ∂ α h ξ in the following error equation where we have also used the relations (3.37a)- (3.37c Collecting above two estimates into (3.38) and using Cauchy-Schwarz inequality as well as Gronwall's inequality, we finally get The claim is thus verified.
In what follows, we will prove (3.37) by induction.
Step 1 For α = 1, ξ ≤ Ch k+1 is well known, and thus (3.37a) is valid for α = 1. Moreover, (3.37c), namely ξ t ≤ Ch k+1 has been given in (3.4c); see [18]. To complete the proof for α = 1, we need only to establish the following relation Proof Noting the relation (3.4b), we need only to prove To do that, we consider the cell error equation which holds for any v h ∈ V α h and j = 1, . . . , N . If we now take v h = 1 in the above identity and use the strong form (2.3b) for H j (aξ, v h ), we get It follows from the assumption |a(x)| ≥ δ > 0 that (3.41) By Cauchy-Schwarz inequality, we have By the definition of the projection P − h , (2.9b) Inserting the above two estimates into (3.41), we arrive at where we have used the bound for ξ , the relation (3.4b) and approximation error estimates (2.10a), and thus (3.40) follows. Therefore, (3.37) is valid for α = 1.
Step 2 Suppose that (3.37) is true for α = . That is let us prove that it also holds for α = + 1. First, as shown in our claim, (3.42) implies that The above estimate together with (3.42a) produces Therefore, (3.37a) is valid for α = + 1. Next, by assumption (3.42b), we can see that to show (3.37b) for α = + 1, we need only to show Without loss of generality, let us take β = ( , 0) for example. To this end, we consider the following error equation which holds for any v h ∈ V α h . We use Leibniz rule (2.6b) to write out ∂ h (aξ) as

Therefore, the error equation becomes
Let us now work on Z 0 . By the strong form of H, (2.4b), we have Denote L k the standard Legendre polynomials of degree k in [−1, 1]. If we now being a constant and s = = 0. Substituting above expression into (3.44) and taking into account the assumption that a(x) ≥ δ > 0, we have It is easy to show by Corollary 1 that where we have used (3.42a)-(3.42c), since − i ≤ − 1 for i ≥ 1. By Corollary 2, we have By (3.43) and inverse property (i), we arrive at a bound for Z 0,1 and Z 0,2 The triangle inequality and the approximation error estimate (3.3) yield Collecting the estimates (3.46a)-(3.46d) into (3.45) and using the fact that v h ≤ C (∂ h ξ) x , we arrive at If we take v h = 1 in the cell error equation and use an analysis similar to that in the proof of (3.40), we will get the following relation A combination of (3.47) and (3.48) gives us Therefore, (3.37b) still holds for α = + 1.

Superconvergent error estimates
For nonlinear hyperbolic equations, the negative-order norm estimate of the DG error itself has been established in [16]. However, by post-processing theory [5,11], negative-order norm estimates of divided differences of the DG error are also needed to obtain superconvergent error estimates for the post-processed solution in the L 2 norm. Using a duality argument together with L 2 norm estimates established in Sect. 3, we show that for a given time T , the α-th order divided difference of the DG error in the negative-order norm achieves 2k + 3 2 − α 2 th order superconvergence. As a consequence, the DG solution u h (T ), converges with at least 3 2 k + 1 th order in the L 2 norm when convolved with a particularly designed kernel.
We are now ready to state our main theorem about the negative-order norm estimates of divided differences of the DG error.  = (a, b), if the finite element space V α h of piecewise polynomials with arbitrary degree k ≥ 1 is used, then for small enough h and any T > 0 there holds the following error estimate where the positive constant C depends on u, δ, T and f , but is independent of h.
Combining Theorems 4 and 1, we have

Corollary 5 Under the same conditions as in Theorem 4, if in addition K ν,k+1
h is a convolution kernel consisting of ν = 2k + 1 + ω (ω ≥ − k 2 ) B-splines of order k + 1 such that it reproduces polynomials of degree ν − 1, then we have

Remark 6
The ( 3 2 k + 1)th order superconvergence is shown for the negative k + 1 norm, and thus is valid for B-splines of order k + 1 (by Theorem 1). For general order of B-splines and α ≤ , using similar argument for the proof of the negative k + 1 norm estimates (see Sect. 4.1), we can prove the following superconvergent error estimate Therefore, from the theoretical point of view, a higher order of B-splines may lead to a superconvergence result of higher order, for example = k + 1 and thus ( 3 2 k + 1)th order in Corollary 5. However, from the practical point of view, changing the order of B-splines does not affect the order of superconvergence; see Sect. 5 below and also [17].

Proof of the main results in the negative-order norm
Similar to the proof for the L 2 norm estimates of the divided differences in Sect. 3.2, we will only consider the case f (u(x, t) To perform the analysis for the negative-order norm, by (2.5), we need to concentrate on the estimate of for ∈ C ∞ 0 ( ). To do that, we use the duality argument, following [16,19]. For the nonlinear hyperbolic Eq. (2.1), we choose the dual equation as: Find a function ϕ such that ϕ(·, t) is periodic for all t ∈ [0, T ] and (4.4b) Unlike the purely linear case [11,15] or the variable coefficient case [19], the dual equations for nonlinear problems will no longer preserve the inner product of original solution ∂ α h u and its dual solution ϕ, namely d dt ∂ α h u, ϕ = 0. In fact, if we multiply (2.1a) by ϕ and (4.4a) by (−1) α u and integrate over , we get, after using integration by parts and summation by parts (2.6d), that Note that F(u; ϕ) is the same as that in [16] when α = 0. We now integrate (4.5) with respect to time between 0 and T to obtain a relation ∂ α h u, ϕ in different time level In what follows, we work on the estimate of (4.3). To do that, let us begin by using the relation (4.6) to get an equivalent form of (4.3). It reads, for any will be estimated one by one below. Note that in our analysis for ∂ α h (u − u h )(T ) in Theorem 2, we need to choose a particular initial condition, namely ∂ α for purely linear equations [11,15]. Thus, we arrive at a slightly different bound for G 1 , as shown in the following lemma. We note that using the L 2 projection in the numerical examples is still sufficient to obtain superconvergence.

Lemma 10 (Projection estimate) There exists a positive constant C, independent of h, such that
, we have the following identity . A combination of Cauchy-Schwarz inequality and approximation error estimates (2.10a) leads to the desired result (4.7).
The bound for G 2 is given in the following lemma.

Lemma 11 (Residual) There exists a positive constant C, independent of h, such that
(4.8) Proof Denoting by G the term inside the time integral of G 2 , we get, by taking χ = P k ϕ, the following expression for G, which is equivalent to where we have added and subtracted the term ∂ α h f (u), (ϕ − P k ϕ) x and used integration by parts.
Let us now consider the estimates of G 1 , G 2 , G 3 . For G 1 , by using the second order Taylor expansion for f (u) − f (u h ), (3.9), we get where G lin 1 and G nlr 1 , respectively, represent the linear part and the nonlinear part of G 1 . It is easy to show, by using the Leibniz rule (2.6b) and Cauchy-Schwarz inequality, that where we have used the estimate for ∂ α− h e in Corollary 3 and the approximation error estimate (2.10a). Analogously, for high order nonlinear term G nlr 1 , we have where we have used the (2.6b) twice, the inverse property (iii), the L 2 norm estimate (3.2), and the approximation error estimate (2.10a). A combination of above two estimates yields To estimate G 2 , we use an analysis similar to that in the proof of G 1 in Lemma 10 and make use of the orthogonal property of the L 2 projection P k to get 11) where we have used the approximation error estimate (2.10a).
We proceed to estimate G 3 . It follows from the Taylor expansion (3.9), the Leibniz rule (2.6b), the Cauchy-Schwarz inequality and the inverse properties (ii), (iii) that where we have also used (3.2) and (2.10a). Collecting the estimates (4.10)-(4.12), we get |G| ≤ Ch 2k+ 3 Consequently, the estimate for G 2 follows by integrating the above inequality with respect to time.
We move on to the estimate of G 3 , which is given in the following lemma.

Lemma 12 (Consistency) There exists a positive constant C, independent of h, such that
(4.14) Proof To do that, let us denote by G 4 the term inside the integral G 3 and take into account (2.6d) to obtain an equivalent form of G 4 where we have used the dual problem (4.4) and the fact that [[ϕ]] = 0 due to the smoothness of ϕ. Next, by using the second order the Taylor expansion (3.9) and (2.6d) again, we arrive at If we now use (2.6b) twice for ∂ α h (R 1 e 2 ) and the Cauchy-Schwarz inequality together with the error estimate (3.2), we get where we have also used the Sobolev inequality ϕ x ∞ ≤ C ϕ k+1 , under the condition that k > 1/2. The bound for G 3 follows immediately by integrating the above inequality with respect to time.
We are now ready to obtain the final negative-order norm error estimates for the divided differences. By collecting the results in Lemmas 10-12 and taking into account the regularity result in Lemma 3, namely ϕ k+1 ≤ C k+1 , we get a bound for Thus, by (2.5), we have the bound for the negative-order norm This finishes the proof of Theorem 4.

Numerical examples
For nonlinear hyperbolic equations, we proved L 2 norm superconvergence results of order 3 2 k + 1 for post-processed errors, as shown in Corollary 5. The superconvergence results together with the post-processing theory by Bramble and Schatz in Theorem 1 entail us to design a more compact kernel to achieve the desired superconvergence order. We note that superconvergence of post-processed errors using the standard kernel (a kernel function composed of a linear combination of 2k + 1 B-splines of order k + 1) for nonlinear hyperbolic equations has been numerically studied in [11,16]. Note that the order of B-splines does not have significant effect on the rate of convergence numerically and that it is the number of B-splines that has greater effect to the convergence order theoretically [11], we will only focus on the effect of different total numbers (denoted by ν = 2k + 1 + ω with ω ≥ − k 2 ) of B-splines of the kernel in our numerical experiments. For more numerical results using different orders of B-splines, we refer the readers to [17].
We consider the DG method combined with the third-order Runge-Kutta method in time. We take a small enough time step such that the spatial errors dominate. We present the results for P 2 and P 3 polynomials only to save space, in which a specific value of ω is chosen to match the orders given in Corollary 5. For the numerical initial condition, we take the standard L 2 projection of the initial condition and we Table 1 Before post-processing (left), after post-processing (middle) and post-processing with the more compact kernel (right). T = 0.   Noting that f (u) changes its sign in the computational domain, we use the Godunov flux, which is an upwind flux. The errors at T = 0.3, when the solution is still smooth, are given in Table 1. From the table, we can see that one can improve the order of convergence from k + 1 to at least 2k + 1, which is similar to the results for Burgers equations in [11]. Moreover, superconvergence of order 2k can be observed for the compact kernel with ω = −2, as, in general, a symmetric kernel could yield one additional order. This is why instead of ω = − k 2 = −1, ω = −2 is chosen in our kernel. The pointwise errors are plotted in Fig. 1, which show that the postprocessed errors are less oscillatory and much smaller in magnitude for a large number of elements as observed in [11], and that the errors of our more compact kernel with ω = −2 are less oscillatory than that for the standard kernel with ω = 0, although the magnitude of the errors increase. This example demonstrates that the superconvergence result also holds for conservation laws with a general flux function.
with periodic boundary conditions.
We test the Example 2 at T = 0.1 before the shock is developed. The orders of convergence with different kernels are listed in Table 2 and pointwise errors are plotted in Fig. 2. We can see that the post-processed errors are less oscillatory and much smaller in magnitude for most of elements as observed in [16], and that the errors of our more compact kernel with ω = −2 are slightly less oscillatory than that for the standard kernel with ω = 0. This example demonstrates that the accuracyenhancement technique also holds true for conservation laws with a strong nonlinearity that is not a polynomial of u.

Concluding remarks
In this paper, the accuracy-enhancement of the DG method for nonlinear hyperbolic conservation laws is studied. We first prove that the α-th order divided difference of the DG error in the L 2 norm is of order k + 3 2 − α 2 when piecewise polynomials of degree k and upwind fluxes are used, provided that | f (u)| is uniformly lower bounded by a positive constant. Then, by a duality argument, the corresponding negative-order norm estimates of order 2k + 3 2 − α 2 are obtained, ensuring that the SIAC filter will achieve at least ( 3 2 k + 1)th order superconvergence. As a by-product, we show, for variable coefficient hyperbolic equations with f (u) = a(x)u, the optimal error estimates of order k + 1 for the L 2 norm of divided differences of the DG error, provided that |a(x)| is uniformly lower bounded by a positive constant. Consequently, the superconvergence result of order 2k + 1 is obtained for the negative-order norm. Numerical experiments are given which show that using more compact kernels are less oscillatory and that the superconvergence property holds true for nonlinear conservation laws with general flux functions, indicating that the restriction on f (u) is artificial. Based on our numerical results we can see that these estimates are not sharp. However, they indicate that a more compact kernel can be used in obtaining superconvergence results. Future work includes the study of accuracy-enhancement of the DG method for one-dimensional nonlinear symmetric/symmetrizable systems and scalar nonlinear conservation laws in multi-dimensional cases on structured as well as unstructured meshes. Analysis of the superconvergence property of the local DG (LDG) method for nonlinear diffusion equations is also on-going work.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

The proof of Lemma 5
Let us prove the relation (3.24) in Lemma 5. Use the Taylor expansion (3.9) and the identity (2.6b) to rewrite This allows the error Eq. (3.6) to be written as 1, . . . , 5). In what follows, we will estimate each term above separately. First consider 1 . Begin by using the strong form of H, (2.4b), to get Next, let L k be the standard Legendre polynomial of degree k in [−1, 1], so L k (−1) = (−1) k , and L k is orthogonal to any polynomials of degree at most k − 1. If we now , t) and noting ξ x , L k j+ 1 2 = 0, we arrive at an equivalent form of Y By the inverse property (ii), it is easy to show, Plugging above results into (7.1) and using the assumption that f (u(x, t)) ≥ δ > 0, we get We shall estimate the terms on the right side of (7.4) one by one below. For 2 , by the strong form of H, (2.4b), we have Thus, by Cauchy-Schwarz inequality, we arrive at a bound for 2 A direct application of Corollary 2 leads to a bound for 3 By using analysis similar to that in the proof of (3.13), we get By the Cauchy-Schwarz inequality, we have Using the Cauchy-Schwarz inequality again together with the inverse property (i), and taking into account the fact that The triangle inequality and the approximation error estimate (3.3) yield that Finally, the error estimate (3.24) follows by collecting the estimates (7.5a)-(7.5g) into (7.4) and by using the estimates (3.4a)-(3.4c), (3.17) and (3.14). This finishes the proof of Lemma 5.

The proof of Lemma 7
To prove the error estimate (3.26), it is necessary to get a bound for the initial error ξ tt (0) . To do that, we start by noting that ξ(0) = 0, and that ξ t (0) ≤ Ch k+1 , which have already been proved in [18,Appendix A.2]. Next, note also that the first order time derivative of the original error equation If we now let v h = ξ tt (0) and use a similar argument for the proof of ξ t (0) in [18], we arrive at a bound for ξ tt (0) We then move on to the estimate of ξ tt (T ) for T > 0. To this end, we take the second order derivative of the original error equation with respect to t and let v h = ξ tt to get To estimate the right-hand side of (7.7), we use the Taylor expansion (3.9) and the Leibniz rule for partial derivatives to rewrite ∂ tt ( f (u) − f (u h )) as = (∂ tt f (u))ξ + 2(∂ t f (u))ξ t + f (u)ξ tt + (∂ tt f (u))η + 2(∂ t f (u))η t + f (u)η tt − (∂ tt R 1 )e 2 − 2(∂ t R 1 )∂ t e 2 − R 1 (∂ tt e 2 ) λ 1 + · · · + λ 9 .
Therefore, the right side of (7.7) can be written as ), ξ tt ) = 1 + · · · + 9 (7.8) with i = H(λ i , ξ tt ) (i = 1, . . . , 9), which will be estimated one by one below. By (2.11a) in Lemma 1, it is easy to show for 1 that where we have used the estimates (3.4a)-(3.4c) and also Young's inequality. Analogously, where we have also used the estimate (3.4c) and the relation (3.25). A direct application of (2.11b) in Lemma 1 together with the assumption that f (u) ≥ δ > 0 leads to the estimate for 3 : Noting that η t = u t − P − h (u t ) and η tt = u tt − P − h (u tt ), we have, by Lemma 2 | 4 | + | 5 | + | 6 | ≤ C h k+1 ξ tt . (7.9d) By an analysis similar to that in the proof of (3.13), we get | 7 | ≤ C h −1 e ∞ ( ξ + h k+1 ) ξ tt , Note that the result of Lemma 7 is used to prove the convergence result for the second order divided difference of the DG error, which implies that k ≥ 1. Therefore, by using the inverse property (iii), the superconvergence result (3.4a), (3.4c), and the approximation error estimate (2.10b), we have for small enough h where C is a positive constant independent of h. Consequently, Collecting the estimates (7.9a)-(7.9g) into (7.7) and (7.8), we get, after a straightforward application of Cauchy-Schwarz inequality and Young's inequality, that where we have used the estimates (3.4a) and (3.4c) for the last step. Now, we integrate the above inequality with respect to time between 0 and T and combine with the initial error estimate (7.6) to obtain By the estimates (3.4a) and (3.4c) again, we arrive at Finally, using Gronwall's inequality gives us which completes the proof of Lemma 7.

The proof of Lemma 8
To prove the error estimate (3.27), it is necessary to get a bound for the initial error ξ t (0) . To do that, we start by noting that ξ(0) = 0, and thusξ(0) = 0, due to the choice of the initial data. Next, note also that the error equation (3.6) still holds at t = 0 for any v h ∈ V α h . If we now let v h =ξ t (0) and use a similar argument for the proof of ξ t (0) in [18], we arrive at a bound for ξ t (0) ξ t (0) ≤ Ch k+1 . (7.12) We then move on to the estimate of ξ t (T ) for T > 0. To obtain this, take the time derivative of the error Eq. (3.6) and let v h =ξ t to get To estimate the right-hand side of (7.13), we use the Taylor expansion (3.9) and the Leibniz rule (2.6b) to rewrite = ∂ t f (u(x + h/2))ξ(x) + ∂ h (∂ t f (u))ξ(x − h/2) + f (u(x + h/2))ξ t (x) − ∂ h R 1 ∂ t e 2 (x − h/2) − ∂ t R 1 (u(x + h/2))∂ h e 2 − ∂ h (∂ t R 1 )e 2 (x − h/2) π 1 + · · · + π 10 .
This allows the right side of (7.13) to be written as ),ξ t ) = 1 + · · · + 10 (7.14) with i = H(π i ,ξ t ) for i = 1, . . . , 10, which is estimated separately below. By (2.11a) in Lemma 1, it is easy to show for 1 that where we have used the estimate (3.17), the relation (3.24), and also the Young's inequality. Analogously, for 2 and 4 , we apply Corollary 1 to get where we have also used the estimates (3.4a)-(3.4c), and the relation (3.25). A direct application of (2.11b) in Lemma 1 together with the assumption that f (u) ≥ δ > 0 leads to the estimate for 3 : Noting that η t = u t − P − h (u t ), we have, by Corollary 2 By an analysis similar to that in the proof of (3.13), we get 15g) | 9 | ≤ C( ξ + h k+1 ) ξ t , (7.15h) Collecting the estimates (7.15a)-(7.15i) into (7.13) and (7.14), we get, after a straightforward application of Cauchy-Schwarz inequality and Young's inequality, that 1 2 where we have used the estimates (3.4a), (3.4c), (3.17) and (3.26) in the last step. Now, we integrate the above inequality with respect to time between 0 and T and combine with the initial error estimate (7.12) to obtain By the estimates (3.4a), (3.4c) and (3.17) again, we arrive at Finally, Gronwall's inequality gives