I did some benchmarks on the primenum search using the well known sieve of Eratosthenes. I did use my L1VM bracket language and Java. The programs are searching for the prime numbers up to 10000000.

My primes-4 L1VM program uses the new if optimizer, which optimizes if calls. It just saves two operations. And uses only two opcodes for an if.

The Java program seems to be good for a comparison and is well optimized IMHO.

My L1VM is faster than Java in this case. It can run more instructions per cycle. Also the page faults counts are a lot less.

Or in other words: “Let’s put a dent in the universe!” - Steve Jobs

... snip ...
9999971
9999973
9999991

 Performance counter stats for 'java SieveOfEratosthenes':

          5.123,43 msec task-clock                #    1,098 CPUs utilized          
             6.798      context-switches          #    0,001 M/sec                  
                82      cpu-migrations            #    0,016 K/sec                  
             4.611      page-faults               #    0,900 K/sec                  
    12.273.745.268      cycles                    #    2,396 GHz                    
     8.646.049.734      instructions              #    0,70  insn per cycle         
     1.814.826.502      branches                  #  354,221 M/sec                  
        33.365.480      branch-misses             #    1,84% of all branches        

       4,666948099 seconds time elapsed

       2,580319000 seconds user
       2,568318000 seconds sys
... snip ...
9999971
9999973
9999991
EXIT

 Performance counter stats for 'vm/l1vm-nojit prog/primes-4':

          4.333,76 msec task-clock                #    0,943 CPUs utilized          
               215      context-switches          #    0,050 K/sec                  
                22      cpu-migrations            #    0,005 K/sec                  
               957      page-faults               #    0,221 K/sec                  
    10.759.561.410      cycles                    #    2,483 GHz                    
    14.633.831.959      instructions              #    1,36  insn per cycle         
     2.252.911.335      branches                  #  519,852 M/sec                  
        13.485.970      branch-misses             #    0,60% of all branches        

       4,597503274 seconds time elapsed

       2,853030000 seconds user
       1,484863000 seconds sys

The primes-4 L1VM program is in my GitHub repository in prog/ directory. Here is the Java program I used to benchmark Java:

class SieveOfEratosthenes
{
    void sieveOfEratosthenes(int n)
    {
        // Create a boolean array "prime[0..n]" and initialize
        // all entries it as true. A value in prime[i] will
        // finally be false if i is Not a prime, else true.
        boolean prime[] = new boolean[n+1];
        for(int i=0;i<n;i++)
            prime[i] = true;

        for(int p = 2; p*p <=n; p++)
        {
            // If prime[p] is not changed, then it is a prime
            if(prime[p] == true)
            {
                // Update all multiples of p
                for(int i = p*p; i <= n; i += p)
                    prime[i] = false;
            }
        }

        // Print all prime numbers
        for(int i = 2; i <= n; i++)
        {
            if(prime[i] == true)
                System.out.println(i);
        }
    }

    // Driver Program to test above function
    public static void main(String args[])
    {
        int n = 10000000;
        System.out.print("Following are the prime numbers ");
        System.out.println("smaller than or equal to " + n);
        SieveOfEratosthenes g = new SieveOfEratosthenes();
        g.sieveOfEratosthenes(n);
    }
}

// This code has been contributed by Amit Khandelwal.

More information about this Java program is here: Sieve of Eratosthenes

Update

If I compile the L1VM with -O2 instead of -Os (size optimization) then I get the following result:

9999971
9999973
9999991
EXIT

 Performance counter stats for 'vm/l1vm-nojit-O2 prog/primes-4':

          3.991,23 msec task-clock                #    0,948 CPUs utilized          
                90      context-switches          #    0,023 K/sec                  
                12      cpu-migrations            #    0,003 K/sec                  
               947      page-faults               #    0,237 K/sec                  
     9.938.794.245      cycles                    #    2,490 GHz                    
    13.572.030.984      instructions              #    1,37  insn per cycle         
     1.983.227.629      branches                  #  496,896 M/sec                  
        12.118.955      branch-misses             #    0,61% of all branches        

       4,211125771 seconds time elapsed

       2,607546000 seconds user
       1,385135000 seconds sys

That is a bit faster and makes it more clear that the L1VM is faster.