The other day I stumbled upon a problem on leetcode - leet code 478.
The problem statement essentially is:
Write a function \(randPoint()\) that generates a uniform random point in the unit circle.
There are 2 approaches that come to mind: \(\chi^2\) test and Kolmogorov-Smirnov test. However, below I present much simpler approach that would suffice for the problem at hand.
In essense, split the circle into small squares and count frequency how many times a square has been sampled.
Then comptue max to min frequency ratio and compare it to a threshold (that has been chosen empirically). If the value is higher than the threshold, assume the distribution is not uniform.
We will also keep a helper function to visualize what our sampler returns.
Quick verification of the test function itself. First, a sampler that returns points outside of the unit circle:
Then, a sampler that leaves some areas in the unit circle undersampled:
Let's tackle the original prolem now.
The first solution I came up with was dead simple: generate a random angle \(\varphi \sim U(0; 2\pi)\), and a random length \(r \sim U(0; 1)\), then return a point \((r \cos(\varphi), r \sin\varphi)\).
It turns out that this is not a correct solution. In particular, the points won't be uniformly distributed in a circle. To see why, take a look at a diagram below. These are all points \((r_j \cos(\varphi_i), r_j \sin\varphi_i)\) where \(\varphi_i=\frac{2\pi i}{N}, r_j=\frac{j}{M}\) for \(i=0,1,...,N, j=0,1,...,M\) for some integers \(N\) and \(M\) (in this particular case, \(N=100, M=15\)).
It's easy to see that density is higher closer to the center of the circle and decreases the further we go from the center. Below is what would be a sample of points given by this erroneous method.
It actually took me quite some time to understand that this is a problem.
After I understood the problem with uniform, I was wondering: how does leetcode check the correctness of the function? This is usually not something a typical online judge is capable of doing. More thoughts on this later, and for now let's proceed to see a simple and working solution.
The next simplest thing to try is a method called "rejection sampling". In essense, pick a random point on a \(2 \times 2\) square. If it is in the unit circle - return it, otherwise try sampling one more time.
What's the expected number of invocations of \(sampleFromSquare\) for every invocation of \(randPoint\)?
It's easy to see that it equals \(\frac{Area_{square}}{Area_{circle}}=\frac{4}{\pi} \approx 1.3 \).
One might wonder - can we choose a random point on a square with only one sample from \(U(0, 1)\).
It's possible if there exists a bijective mapping from a unit square to a unit segment. An example of such mapping can be seen here: https://demonstrations.wolfram.com/BijectiveMappingOfAnIntervalToASquare/. Say we have generated a point \(p \sim U(0, 1)\), take all odd digits to form an \(x\)-coordinate and all even digits to form a \(y\)-coordinate. In the linked demonstration \(base=2\) is chosen, but any other base would work just as well. For example, I have chosen \(base=6\) below.
But now let's come back to the very first approach.
We've seen that choosing \(\varphi \sim U(0, 2\pi)\) and choosing \(r \sim U(0, 1)\) won't work. Let's find out the correct distribution to sample l from.
Let's split the circle into concentric rings. Consider 2 rings of infinitely small width \(\Delta\). Suppose their radius is \(r_1\) and \(r_2\). Then the area of the first ring can be computed as an area of a rectangle with sides \(len(r_1)\) and \(\Delta\) where \(len(r_1)=2 \pi r_1\), so \(Area_1=\Delta \times 2 \pi r_1\). Same for the second ring - its area is \(Area_2 = \Delta \times 2 \pi r_2\).
Let \(f(t)\) be a pdf with value \(0\) everywhere outside of \(0 \leq t \leq 1\).
For \(f\) to be a valid pdf, we require \(\int_0^1 f(t) dt=1\).
If we want our points to be uniformly distributed, we want to have \(\frac{f(r_1)}{f(r_2)} = \frac{Area_1}{Area_2} = \frac{r_1}{r_2}\).
This should hold true for any chosen radious \(r_1, r_2\). So let's choose \(r_2=1\) and assume \(f(r_2)=C\). Then \(f(r_1)=Cr_1\), and this is the only type of function that will satisfy our property of uniformity. Now we only have to find the constant which is easy to do:
\(\int_0^1 Cr dr = 1\)
\(\frac{Cr^2}{2} \vert_0^1 = 1\)
\(C/2 = 1\)
\(C = 2\)
So now we have it, pdf is \(f(t)=2t\), and CDF is \(F(t)=t^2\).
Graph of pdf:
Graph of cdf:
But how do we get to this distribution if we only have uniform (or if we only have normal, or if we only have anything else).
Suppose our environment provides a method to sample a value from some distribution \(v_1 \in [0; 1]\) with CDF \(F_1\), and we would like to sample a value from 0 to 1 from a distribution with CDF \(F_2: v_2 \in [0; 1]\).
\(F_1(v_1) = F_2(v_2)\)
\(v_2 = F_2^{-1}(F_1(v_1))\)
For \(v_1 \sim U(0, 1)\) CDF is \(F_1(t)=1-v_1\), so \(v_2(v_1) = \sqrt{1 - v_1}\) which leads to the following code:
One more way to solve the same problem is to divide the circle with vertical lines into segments (each can be approximated with a rectangle). Then \(pdf(x) = k \times height(x) = 2k\sqrt{1 - x^2}\). After x is chosen, y can be chosen uniformly \(y \sim U(-\sqrt{1 - x^2}, +\sqrt{1 - x^2})\).
\(\int_{-1}^1 pdf(x) dx = 1\) for it to be a valid pdf.
\(\int_{-1}^1 2k\sqrt{1-x^2}dx = k\pi\), so \(k = \frac{1}{\pi}\).
For pdf \(f(x)=\frac{2\sqrt{1-x^2}}{\pi}\) CDF is \(F(x)=x\frac{\sqrt{1-x^2} + \sin^{-1}x}{\pi} + C\) such that \(F(-1)=0\) and \(F(1)=1\).
\(F(-1)=(-\pi/2)/\pi+C \Rightarrow C=1/2.\) so \(F(x) = \frac{x\sqrt{1-x^2} + \sin^{-1}x)}{\pi} + 1/2\).
As a sanity check, \(F(1)=(\pi/2)/\pi + C = 1\).
Graph of pdf:
Graph of cdf:
The problem though is I can't find a way to express \(CDF^{-1}\). However, since CDF is monotonically increasing we can use binary search.
The code then will look like:
To conclude, let's just take a look at distributions that is biased towards outer edge:
© Copyright 2020, Iaroslav Tymchenko