Curve Fitting with Julia
Subtitle: Why Oil Production May Not Follow a Gaussian Function
“How did you go bankrupt?” Bill asked.
“Two ways,” Mike said. “Gradually, then suddenly.”—Ernest Hemingway, The Sun Also Rises
# The Weierstrass Approximation Theorem
In 1885, Karl Weierstrass
proved that any continuous function can be approximated as closely as you like with a polynomial. Written in mathematical terms, the theorem he proved is
Theorem: Suppose is continuous. For any , there exists a polynomial such that
What does all this mean? Let’s start with . This says that the function takes any value of between and and returns a real-valued number, and can be anything. A function is continuous if it exists at every and you can trace the curve without lifting your pencil.
The part about “For any ” is a trick mathematicians use a lot, and it means if you think of some very small positive number, we can always find another even smaller, but still positive number. The way we use this very, very tiny is to take the absolute value of the difference between and at every and make sure it’s less than .
The is not “'Sup, dude?”, but it’s short for supremum and it means that for all possible polynomials approximating , we want to choose the one where the maximum absolute value difference between and for any in the interval is the minimum. In other words, we want to choose the which minimizes the biggest error.
You might be wondering why the values are constrained to be between and , and the answer is, it doesn’t matter. You can shrink, stretch and shift to fit any interval. Suppose your function is defined over the interval . Then, for every , make
so when , and when , . Then modify your function to
and you get the same values as , but over the interval .
A theorem isn’t a theorem until it’s been proved. Weierstrass proved his theorem, but in 1912, Sergei Bernstein
used Bernstein basis polynomials to construct approximation functions. In “Yacht Design with Mathematics” we used Bernstein polynomials as the basis for Bezier functions. For any , the Bernstein basis polynomials are defined as
with in the interval . The Binomial Theorem shows how to expand the polynomial in and to the power,
Substitute , add up all Bernstein polynomials and you get,
for any . Bernstein then defined a new function which included a continuous function on the interval to be
Michael Bertrand gives the complete version of Bernstein’s proof in his article, “Bernstein Proves the Weierstrass Approximation Theorem”, and I’ll provide another version at the end.
In 1938, Marshall Stone
extended Weierstrass’ Theorem which is now known as the Stone-Weierstrass Theorem.
# Some Applications
What can you do with a polynomial approximation to a function? With a good polynomial fit, you can use the polynomial to estimate the derivative or the integral of , and this is what is done with software packages like Chebfun for Matlab, ApproxFun in Julia, or Python libraries such as pychebfun, ChebPy, or PaCAL. Polynomials are easy to differentiate and integrate, which is often not true for arbitrary functions, so having a good approximation is quite useful. These software tools can approximate continuous functions to machine precision, and computing with polynomials is fast.
With an approximate derivative or integral you can solve differential equations, and this has led to applications in neural networks as described in “Universal Invariant and Equivariant Graph Neural Networks”. For insight into how Chebfun works, see “Chebfun: A New Kind of Numerical Computing” by Platte and Trefethen. The Chebfun website has many examples,
- Rational approximation of the Fermi-Dirac function describing the distribution of energy states for systems of particles.
- A drowning man and Snell’s Law is derived from Fermat’s principle giving the refraction of light through different media.
- Exponential, logistic, and Gompertz growth. “The Boonie Conundrum” used the Gompertz function to model the spread of a rumor.
- Procrustes shape analysis. We talked about the Greek myth of Procrustes in “Designing the Model”.
- Modeling infectious disease outbreaks. In “How Deep Is That Pool?” we showed how to calculate the optimal pool size for COVID-19 testing.
- Smoothies: nowhere analytic functions. Weierstrass found functions that are continuous everywhere, but not differentiable anywhere.
You’ll find plenty of nerdy-cool projects to try out. The only negative aspect of Chebfun is it requires the commercial software Matlab, but you may find an open-source version you prefer in Chebfun-related projects.
Polynomials aren’t the only way to approximate functions. Fourier approximation and wavelets are other methods, and Nick Trefethen has written the book, “Approximation Theory and Approximation Practice” (partially available online). See the Publications page for other references.
Something you should never do is fit a function with a high degree polynomial and then extrapolate beyond the limits of . You only get a good fit over the interval where you defined .
Here, the function
may be approximated by the polynomial
Between and the fit is pretty good, but outside the interval, the fit is much worse.
# A Naive Approach to Approximations
Suppose you knew nothing about approximation theory and decided to try fitting one function (or data) with the sum of a simpler function?
Marion King Hubbert was a geologist, mathematician, and physicist who worked for Shell Oil, the USGS, and taught at Columbia, Stanford, and UC Berkeley. In 1956, he looked at oil production data and proposed a model for the amount of oil extracted from a field,
where is the total quantity of oil to eventually be produced from the field, is the cumulative production up to time , and constants and are parameters defined by local geology. This is the logistic function first described by Pierre Francois Verhulst to model population growth. The derivative of the logistic function produces a bell-shaped curve (or Gaussian function), which for Hubbert was the instantaneous rate of oil flow from the field.
Hubbert used the logistic function because initially with only a few wells production would be low, but as more came on line production should increase. Eventually, though, depletion would slow the rate of production. Hubbert thought his model could be extended to all U.S. oil production and predicted the peak of production would occur in 1970 for the lower 48 states.
In 1976 Hubbert predicted the world peak in production might happen as soon as 1995. A 2007 study by the UK Energy Resource Center found no geological reason to expect cumulative oil production should follow a logistic curve. They reported on field data suggesting peaks tended to happen at about 25% of the total production .
On the other hand, Ugo Bardi says decline is faster than growth and likes to quote Seneca,
“It would be some consolation for the feebleness of our selves and our works if all things should perish as slowly as they come into being; but as it is, increases are of sluggish growth, but the way to ruin is rapid.” Lucius Anneaus Seneca, Letters to Lucilius, n. 91
We can model individual fields using Gaussian functions,
where represents the peak amount produced, is the time of peak production and determines the width of the bell curve. The Ghawar field in Saudi Arabia produces tens of thousands of barrels per day (), has been in operation since , and continues to this day so is large. More recently, U.S. shale wells have increased production (the green curve departing from Hubbert’s prediction), but their peaks are much lower and production drops 90% from the peak after only three years.
Is it possible to generate Bardi’s Seneca Cliff with a series of Gaussians? Using Julia, the Seneca Cliff can be modeled as
f(x) = 0 < x <= 1 ? x : 0
which asks the question, Is between and ? If it is, , otherwise it’s zero.
If oil production mimicked this , the rate would increase linearly until suddenly every oil well in the world stopped producing on the same day. This is about as likely as Zaphod Beeblebrox’s Infinite Improbability Drive, but we’re just trying to build a simple mind-sized problem. “See, all my procedures are mind-sized bites”, explains seventh-grader Robert in Seymour Papert’s Mindstorms.
First we need to create a vector of values,
x = Array(-0.5:0.01:1.5));
and then the corresponding values using a list comprehension
y = [f(x_i) for x_i in x];
Next, for each value of the parameters and we can define a function and evaluate it over ,
g(x,α,μ,σ) = sum( α[i] * exp( - (x-μ[i])^2/(2*σ[i]^2) ) for i = 1:n)
or
which gives an estimate for using the sum of Gaussians
y_fit = [g(x_i,α,μ,σ) for x_i in x]
Using the LsqFit.jl package, we’ll get the best parameters to fit . LsqFit requires an initial estimate, so I chose
and then created an initial vector with . To get the fit, LsqFit uses the function curve_fit
fit = curve_fit(gaussSum, x, y, p₀, lower = lb, upper = ub)
where gaussSum
contains the definition for , extracts the parameters from , and generates . The values of the parameters and are required to be between the lower bound lb
and the upper bound ub
which were set to zero and one for this example.
The result is a pretty good approximation
You can see the peaks of each Gaussian with centers and widths . The optimal amplitudes are . The function has a discontinuity at , so we wouldn’t be able to find a polynomial approximation. Smoothing over the discontinuity would help with the sum of Gaussians technique. This doesn’t mean this model necessarily represents what’s actually happening, but it shows that a sum of Gaussian-like production curves can still lead to a Seneca Cliff. The Julia code is on GitHub.
This is a good experimental start to finding a method for fitting Gaussians to functions. Is there a pattern to the values for and ? Can you find a functional representation for the parameters that works for different ? What happens if you change ? Can you find a general rule giving the parameters for any and any ? Test your hypotheses by changing (line 17) and plotting the fit.
# The Proof
Here’s the proof of Weierstrass’ theorem. It’s actually pretty easy to follow if you take it in mind-size bites. It comes from Chris Tisdell, a mathematician at UNSW Sydney. The proof below was taken from his YouTube video and is based on the proof by Dunham Jackson.
Theorem: Suppose is a continuous function. Then we can construct a sequence of polynomials such that for any there exists such that for all we have
Proof: Assume . Extend to the entire real line by defining
This just adds pieces to so that it covers the entire real line by connecting to and to , and then defining for any point outside the interval , (and it’s still continuous)
Now, create a sequence of constants (called a Landau integral),
Here are some examples of the Landau integral, and the numeric values over the interval . Integration gives the area below the curve and above the axis.
Next, define polynomials using these constants,
The first three polynomials of the sequence are
The idea is when gets to be big enough, will very closely match . Notice that the degree of is at most .
Since for every outside of the interval then
Changing the limits of integration from 0 to 1 in the first line to to in the second line doesn’t really do anything for except it now runs over instead. In other words, let
so
where the first and third terms are zero since is zero on those intervals, leaving the middle term unchanged.
If we change variables, letting so , then . The limits of the integral change from
Now, look at the difference between this polynomial approximation and the function we want to approximate ,
where we use the Landau integral
Since is continuous, then for any there is a such that
If is the maximum value of then
since and , then
For any , then , so
and therefore
In either case, must be less than the sum of these two bounds, so
With these bounds, we can write
Let
and then using integration by parts () and letting
So this means the integral can be bounded by ,
In other words,
so long as is sufficiently large such that