Prime Numbers, Algorithms and Computer Architectures

⏲ A 9 min read

What does the principle of locality of reference have to do with prime numbers? This is what we will discover in this post. We will use the segmented version of the Sieve of Eratosthenes to see how hardware specifications can (read should) be used to fix design parameters for our routines.

Table of contents:

A natural number \(p\in\mathbb N\) is said to be prime if its only divisors are 1 and \(p\) itself. Any other number that does not have this property is sometimes called composite. The discovery that there are infinitely many prime numbers dates back to c. 300 BC and is due to Euclid. His argument by contradiction is very simple: suppose that, indeed, there are only finitely many primes, say \(p_1,\ldots,p_n\). The natural number

$$m=p_1p_2\cdots p_n + 1$$

is larger than and evidently not divisible by any of the primes by construction, and therefore \(m\) must be prime. However, being larger than any of the \(p_k\)s, \(m\) cannot be one of the finitely many primes, thus reaching to a contradiction.

Prime numbers play a fundamental role in Number Theory, a branch of Mathematics that deals with the properties of the natural numbers. Everybody gets to know about the prime factorisation of the natural numbers, a result so important that has been given the name of Fundamental Theorem of Arithmetic.

Counting Primes

Even though we saw that prime numbers are infinite, one might still want to know how many prime numbers are there within a certain upper bound. As numbers become bigger, the help of a calculator becomes crucial to tackle this problem and therefore it makes sense to think of algorithms that would get us to the answer efficiently.

The fastest way to count all the primes less than a given upper bound \(n\) is by means of an ancient algorithm known as the Sieve of Eratosthenes. The idea is to start with the sequence of all the numbers from 0 up to \(n\) and discard/mark the composite numbers as they are discovered. By definition, 0 and 1 are not prime, so they are removed. The number 2 is prime, but all its multiples are not, so we proceed by removing all the multiples of 2. We keep the first number that remains after 2, 3 in this case, and proceed to remove its multiples (starting from its square, since smaller multiples have already been removed at the previous steps). The process is repeated until it is no longer possible to proceed beyond the assigned bound \(n\). It is clear that it is enough to get up to at most \(\lceil\sqrt n\rceil\).

The following is a simple implementation in C++.

#include <iostream>
#include <vector>
#include <cmath>
#include <cassert>

using namespace std;

class Sieve {

private:
  vector<bool> * sieve;
  vector<int>  * primes;

public:
  Sieve(int n) : primes(NULL) {
    sieve = new vector<bool>(n + 1, true);

    (*sieve)[0] = false;
    (*sieve)[1] = false;
    for (int i = 0; i <= ceil(sqrt(n)); i++)
      if ((*sieve)[i])
        for (unsigned int j = i * i; j < sieve->size(); j += i)
          (*sieve)[j] = false;
  }

  ~Sieve() {
    delete sieve ;
    delete primes;
  }

  vector<int> * get_primes() {
    if (primes == NULL) {
      primes = new vector<int>();

      for (unsigned int i = 0; i < sieve->size(); i++)
        if ((*sieve)[i]) primes->push_back(i);
    }
    return primes;
  }

  int get(int i) {
    assert(i < count());
    return (*primes)[i];
  }

  int count() {
    if (primes == NULL) get_primes();

    return primes->size();
  }
};

int main() {
  int n;

  cin >> n;

  Sieve s = Sieve(n);

  cout << "There are " << s.count() << " primes between 0 and " << n << endl;

  #ifdef VERBOSE
  for (int i = 0; i < s.count(); i++) cout << s.get(i) << endl;
  #endif

  return 0;
}

A vector of booleans is implemented in C++ by an arry of bits instead of single bytes. Apart from turning all the possible complier optimisations, at the hardware level, this more compact data structure is more cache-friendly. Here is a first link between a software implementation of a prime search and the computer architecture the code runs on.

With an input of the order of \(10^6\) the sieve is still quite fast. However the memory requirements are substantial: up to \(10^9\) we are able to still use integers, but the memory consumption is of the order of the GB. The amount of memory on the system then can pose a serious limitation to the input parameter.

Segmented Sieve

If we want to list and/or count all the primes between two given (and possibly quite large) integers, we need a Segmented Sieve. If we are interested in all the primes between \(a\) and \(b\) we could, in principle, use the sieve of Eratosthenes to find all the primes up to \(b\) and then list/count all the primes larger than \(a\). But with \(b\) of the order, say, \(10^{15}\), a lot of memory is required to hold the result. Instead we can split the interval \([a,b]\) into chunks and process them separately.

The two main questions that we need to answer are: how do we adapt the sieve algorithm to start from \(a\) rather than 0, and how do we fix the chunk size. Let us deal with the latter question first. The reason why we need a segmented sieve in the first place is because of memory limitations. So an upper bound for the chunck size is given by the available memory. However, for large values of the inputs, the sieve might need to jump to memory location which are further apart. But how do we quantify this "further apart"? The answer, again, is in the system architecture, which quite likely include a system of cache memory. In order not to violate the locality principle we should choose a chunk size which is comparable to the cache size. Assuming this to be of the order of the MB, and recalling that vector<bool> is an array of bits, a possible chunk size is of the order of \(10^7\).

Coming to the question of how to implemente a segmented sieve, all we need to do is mark/remove all the composite number in range. Of course we would need to start by removing all the even numbers, then all the multiples of 3, then of 5 and so on. Therefore we still need the knowledge of the primes starting from 2 and going above. But how much above? Since our upper limit is \(b\), we need all the prime numbers up to \(\lceil\sqrt b\rceil\), which can be obtained with the standard sieve discussed earlier. These prime numbers can then be used to discover all the primes in the range \([a,b]\). We start by removing the first even number greater than or equal to \(a\), together with all the numbers obtained by repeatedly adding 2 to it until we are out of bound. More generally, to find the first multiple of the prime \(p\) in \([a,b]\) we use the formula

$$s = \left\lceil\frac ap\right\rceil\cdot p$$

However, recall that, for the standard sieve we really have to start from \(p^2\), since lower multiples of \(p\) have already been removed at the previous iteration. Therefore, as our starting point we pick the maximum between \(s\) and \(p^2\) (actually between \(\lceil a/p\rceil\) and \(p\)).

The following is a simple implementation of the Segmented Sieve in C++.

#define CHUNK 10000000 // 10e7

class SSieve {

private:
  Sieve        * sieve;   // Sieve
  vector<bool> * ssieve;  // Segmented Sieve
  vector<int>  * seg_c;   // primes in each segment
  long long      a, b;    // Bounds
  int            c;       // Cached primes count
  int            seg;     // Current segment
  long long      size;    // Total numbers in interval
  int            max_seg; // Total number of segments

  void do_segment(int s) {
    if (seg == s) return; // Do not regenerate the current segment

    assert(s < max_seg);

    seg = s;

    // Determine segment bounds
    unsigned int l = a + s * CHUNK;
    unsigned int h = l + min(CHUNK, (int)size - s * CHUNK) - 1;

    // Allocate the new segmented sieve
    if (ssieve) delete ssieve;
    ssieve = new vector<bool>(h - l + 1, true);

    // Remove composite numbers in segment
    for (int p : (*sieve->get_primes())) {
      if (p * p > h) break;
      for (int i = max((int)l/p + (l % p == 0 ? 0 : 1), p) * p - l; i < ssieve->size(); i += p) //{
        (*ssieve)[i] = false;
    }
  }

public:
  SSieve(long long low, long long high) : c(-1), ssieve(NULL), seg(-1), seg_c(NULL) {
    assert(low <= high);

    seg_c   = new vector<int>(1, 0);

    if (high < 2) {
      c = 0;
      return;
    }

    if (low < 2) low = 2;

    a       = low;
    b       = high;
    size    = b - a + 1;
    sieve   = new Sieve(ceil(sqrt(b))); // The standard sieve
    max_seg = size / CHUNK + (size % CHUNK > 0 ? 1 : 0);
  }

  ~SSieve() {
    delete sieve;
    delete ssieve;
    delete seg_c;
  }

  unsigned int count() {
    if (c == -1) {
      c = 0;
      for (unsigned int i = 0; i < max_seg; i++) {
        do_segment(i);
        for (bool p : (*ssieve)) if (p) c++;
        // Keep track of the number of primes in segments
        // This is used by SSieve::get to retrieve the primes
        seg_c->push_back(c);
      }
    }
    return c;
  }

  long long get(int i) {
    assert(i >= 0 && i < count());

    int s, k, n = 0;

    // Determine which segment the requested prime belongs to
    for (s = 0; s < seg_c->size() - 1; s++)
      if ((*seg_c)[s + 1] > i) break;

    // Reconstruct the segmented sieve if necessary
    do_segment(s);

    // Translate into the actual prime
    int j = i - (*seg_c)[s];
    for (k = 0; k < ssieve->size(); k++) {
      if ((*ssieve)[k]) n++;
      if (n > j) break;
    }

    return k + s * CHUNK + a;
  }
};

This can be tested with a slightly modified main procedure, for example

int main() {
  long long n, m;

  cin >> n >> m;

  SSieve ss = SSieve(n, m);

  cout << "There are " << ss.count() << " primes between " << n << " and " << m << endl;

  #ifdef VERBOSE
  for (int i = 0; i < ss.count(); i++) cout << ss.get(i) << endl;
  #endif

  return 0;
}