Login [Register]
Don't have an account? Register now to chat, post, use our tools, and much more.
I wrote a TI-BASIC program to find the first 999 primes, and I'm trying to optimize it for speed. This version using trial division takes 100 seconds on jsTIfied with 2.55MP.


Code:
startTmr->T
Repeat checkTmr(T:End
7907->N   //upper limit
{5->L2     //L2 is small primes; L1 is all
{2,3,5->L1    //2 and 3 already checked by mod 6
1->I
5->P
While P<=N
   For(P,P,min(N,L2(I)^^2),6
      If min(remainder(P,L2
      P->L1(1+dim(L1
      If min(remainder(P+2,L2
        P+2->L1(1+dim(L1
    End
    Repeat P<L1(I)^^2-2   //repeat until the square of largest is large enough
        I+1->I
        L1(I+2->L2(I   //remember 2 and 3
    End
End
Disp checkTmr(T
1/(max(L1)=7907 and dim(L1)=999)   //"assert" list is correct


jonbush has a solution in 45 seconds using a segmented sieve of Eratosthenes, and I've heard earthnite has a faster one in 29 seconds. I'll also switch to a sieve to see how fast it can possibly be.

EDIT: Replaced P with [recursiven]; now 98 seconds.
Second attempt with a segmented sieve with segment size 210: 76 seconds. It can still be improved significantly, as I haven't done any wheel sieving yet.

EDIT: 69 seconds.
EDIT 2: 57 seconds.
EDIT 3: Now 54 seconds.


Code:
startTmr->T
Repeat checkTmr(T:End
7907->N
210->B    //segment size
//first, make a list L3 of all 48 mod-210 residues of primes
//they go from A+11 to A+211
//for all of them to be at most N
//this takes 2 seconds
0->dim(L3
For([recursiven],1,B,2:
    If min(remainder([recursiven],{3,5,7
    [recursiven]->L3(1+dim(L3
End
Disp checkTmr(T
//now make a list L4 of small primes, less than sqrt(N). Don't include 2, 3, 5, 7.
//takes 1 more second
{11,13->L4
For([recursiven],13,sqrt(N),2:
    If min(remainder([recursiven],{3,5,7,11,13
    [recursiven]->L4(1+dim(L4
End
//now make a list of their starting points, i.e. their squares
//time is negligible
max(L4+210,L4^^2->L5
11*21->L5(1
13*17->L5(2
Disp checkTmr(T
//finally, make a list L2 to sieve on
//list goes from A+2 to A+B+1
B->dim(L2
augment({2,3,5,7},L4->L1  //L1 is all primes
//fill out the first bucket, from 11 to 211
For([recursiven],[recursiven],B,2:
    If min(remainder([recursiven],{3,5,7,11,13
    [recursiven]->L1(1+dim(L1
End
Disp checkTmr(T
For(A,B,N-B,B     //for each bucket, but not the first or last
    Disp A          //bucket goes from A+1 to A+B
    Fill(1,L2
    For(I,1,dim(L4
        For([recursiven],L5(I)-A,B,2L4(I   //zero all multiples
            0->L2([recursiven]
        End
        [recursiven]+A->L5(I
    End
    Disp 1
    For([recursiven],1,dim(L3:
        If L2(L3([recursiven]
        A+L3([recursiven]->L1(1+dim(L1
    End
End
//finally, finish up the last bucket
    Disp A          //bucket goes from A+1 to A+B
    Fill(1,L2
    For(I,1,dim(L4
        For([recursiven],L5(I)-A,B,2L4(I   //zero all multiples
            0->L2([recursiven]
        End
        [recursiven]+A->L5(I
    End
    Disp 1
    For([recursiven],1,sum(not(cumSum(L3+A>N:
        If L2(L3([recursiven]
        A+L3([recursiven]->L1(1+dim(L1
    End
Disp checkTmr(T
1/(sum(L1)=3674994)


EDIT 4: My program is now about the same speed as jonbush's: 43s on jsTIfied 2.55MP. I didn't account for differences in calculator speed.


Code:
startTmr->T
Repeat checkTmr(T:End
7907->N
210->B    //segment size
//first, make a list L3 of all 48 mod-210 residues of primes
//they go from A+11 to A+211
//for all of them to be at most N
//this takes 2 seconds
0->dim(L3
For([recursiven],1,B,2:
    If min(remainder([recursiven],{3,5,7
    [recursiven]->L3(1+dim(L3
End
Disp checkTmr(T
//now make a list L4 of small primes, less than sqrt(N). Don't include 2, 3, 5, 7.
//takes 1 more second
{11,13->L4
For([recursiven],13,sqrt(N),2:
    If min(remainder([recursiven],{3,5,7,11,13
    [recursiven]->L4(1+dim(L4
End
//now make a list of their starting points, i.e. their squares
//time is negligible
max(L4+210,L4^^2->L5
11*21->L5(1
13*17->L5(2
Disp checkTmr(T
//finally, make a list L2 to sieve on
//list goes from A+2 to A+B+1
B->dim(L2
augment({2,3,5,7},L4->L1  //L1 is all primes
//fill out the first bucket, from 11 to 211
For([recursiven],[recursiven],B,2:
    If min(remainder([recursiven],{3,5,7,11,13
    [recursiven]->L1(1+dim(L1
End
Disp checkTmr(T
For(A,B,N-B,B     //for each bucket, but not the first or last
    //Disp A          //bucket goes from A+1 to A+B
    Fill(1,L2
    L5-B->L5
    For(I,1,dim(L4
        For([recursiven],L5(I),B,2L4(I   //zero all multiples
            0->L2([recursiven]
        End
        [recursiven]->L5(I
    End
    //Disp 1
    For([recursiven],1,dim(L3:
        If L2(L3([recursiven]
        A+L3([recursiven]->L1(1+dim(L1
    End
End
//finally, finish up the last bucket
    //Disp A          //bucket goes from A+1 to A+B
Fill(1,L2
L5-B->L5
For(I,1,dim(L4
   For([recursiven],L5(I),B,2L4(I   //zero all multiples
        0->L2([recursiven]
    End
End
//Disp 1
For([recursiven],1,sum(L3+A<=N:
    If L2(L3([recursiven]
    A+L3([recursiven]->L1(1+dim(L1
End
Disp checkTmr(T
1/(sum(L1)=3674994)


It may be considered cheating that I've hardcoded the primes up to 13, but if that's an issue it can be easily fixed.

EDIT 5: 42s by pre-doubling L4 and by replacing A and B with finance variables. I don't know how else I can optimize it. Bucket size 420 speeds up the algorithm by 3s, but it also slows down the first bucket by 3s, so it breaks even.

The code is pretty big too: 507 bytes. It's a long way to earthnite's 29s—the part of mine that stores the already-identified primes into L1 is almost that slow by itself.
Apparently I wasn't using [recursiven] in the last saved version I have, so now it is down to 40s. My program currently seems kind of haphazardly patched together, so there might be a better way to do things. It is 495 bytes.
I think I could get my program down to 40s or 39s by using a wheel of 2,3 for sieving rather than just odds. This would be done by using L6 as a second list of multiples of L4, making these the 5 mod 6 multiples, and using L5 for 1 mod 6.

The bottleneck is not sieving, though; it's the prime-storing phase. That takes about 27 seconds of my 42 (0.7 seconds or so per iteration), so it's what really needs to be sped up. An explanation of it: L3 is the list of numbers mod 210 that can be primes. L2 is a boolean list of whether {A+1,...,A+210} are prime. I loop over L3, and whenever L2(L3(n)) shows there is a prime, I add it to the list. The following code runs 37 times, with dim(L3) equal to 48:


Code:
For([recursiven],1,dim(L3:
    If L2(L3([recursiven]
    A+L3([recursiven]->L1(1+dim(L1
End


It is possible to use SortA( rather than so many If statements to detect primes. That way, all the primes are moved to the back of the list and are contiguous.


Code:

//earlier, do {A+1,A+2,...A+210->L2, zero where non-primes
SortA(L2
sum(not(L2->PV
dim(L1)-PV->PMT
For([recursiven],PV+1,dim(L2
A+L3([recursiven]->L1(PMT+[recursiven]
End


However, this presents some problems.

* SortA( is O(n^2), so it is much slower on larger lists. On a list of 210 elements it takes at least 3 whole seconds, which is unacceptable. Even on a list of
* Due to poor implementation, SortA( is slower on lists which have many repeated elements. This is a problem, since we're zeroing the non-prime indices, causing many elements to be zero.

The first problem can be mitigated by storing only the odd numbers, in a 105-element list. This causes the sorting time to be smaller, about 0.65s, which is still not fast enough. It can be further sped up by breaking up the list of 105 into two lists of 53 and 52, then sorting each individually for a total of 0.17s * 2 = 0.35 s. However, this will incur greater parser overhead in the rest of the program, so I'm not sure if it is fast enough.

As for the second problem, the values stored into L2 to mark non-primes must be somehow varied. Simply calling rand is much too slow (15 ms per call where we need ~1). I'm not sure how to solve this; maybe precomputing a list of random numbers would be a good idea.
I have a program, 660ish bytes including the timing code, that finishes in 30-34*epsilon seconds. By that I mean that the output of this:


Code:
startTmr->Tmin
Repeat checkTmr(Tmin:End

//program

startTmr->Tmax
For([recursiven],0,|E9:
   If startTmr=Tmax
End
Disp Tmax-Tmin,[recursiven]


is 30 \n 34.

The value of epsilon in seconds seems to change depending on conditions that I can't pin down, so I'm just using it to compare different iterations of a program.

EDIT: Now 29-26*epsilon.
EDIT: Now 29-53*epsilon with prgmSIEVE9.
I have to take a break for now to do actual things, but I have 38s Sad at the moment. This version still represents the even numbers, but it ignores them and all of the numbers eliminated by the wheel sieve during the elimination and screening parts. The implementation is less ideal than I would like, but it works decently. I can't think of anything to make this method better, but I imagine that only representing odd numbers like a Sieve of Sundaram, could yield increased performance by allowing for much larger segments. It is currently 522 bytes.


Code:
startTmr->theta
Repeat checkTmr(theta
End
{2,3,5,7->L3
210->dim(L2
Fill(1,L2
For(A,3,7,2
   For([recursiven],A,210,2A
      0->L2([recursiven]
   End
End
L2->L1
{1->L5
For([recursiven],5,208,6)
   If L2([recursiven]
   [recursiven]->L5(1+dim(L5
   If L2([recursiven]+2
   [recursiven]+2->L5(1+dim(L5
End
209->L5(48
For(A,11,13,2
   For([recursiven],A^^2,210,2A
      0->L1([recursiven]
   End
End
For([recursiven],2,48)
   If L1(L5([recursiven]
   L5([recursiven]->L3(1+dim(L3
End
augment(210+L5,420+L5->L6
For(S,210,7770,630
   S->PMT
   sqrt(Ans+630->T
   137+493(PMT<7770->V
   augment(L2,augment(L2,L2->L1
   5->PV
   L3(Ans->|N
   Repeat T<Ans
      If Ans^^2>PMT
      Then
         Ans^^2-PMT
         Else
         remainder(Ans+Ansint(S/Ans),PMT
         Ans+|Nnot(remainder(Ans,2
      End
      For([recursiven],Ans,V,2|N
         0->L1([recursiven]
      End
      PV+1->PV
      L3(Ans->|N
   End
   For([recursiven],1,31+17(V=630:
      If L1(L5([recursiven]
      PMT+L5([recursiven]->L3(1+dim(L3
   End
   If PMT<7770
   Then
      For([recursiven],1,96)
         If L1(L6([recursiven]
         PMT+L6([recursiven]->L3(1+dim(L3
      End
      0->dim(L1
   End
End
checkTmr(theta+1


Please excuse the dearth of comments.

Side note: On systems with a CPU cache, it is far better to restrict the size of segments to smaller than the cache size so that the program does not have to continually access the RAM during execution. I noticed that this had a major effect when I generated my list list of the first 100 million prime numbers (clicking the link will download a fairly large file).
There are diminishing returns to having larger segments. Imagine that setting up a segment takes 0.2s of work. If you had segments of size 210 you need to do this 35 times, meaning 7.0s of work. Moving to segments of 420 makes this 3.5s, so you've saved 3.5s. Doubling again to 840 saves you another 1.7s, but then going all the way to 7*210=1470, possible by representing only odds, saves only another 0.7s.

By the way, my programs take 7907 (the number being searched to) as a parameter, and also have modifiable segmenr size. The latest one represents only numbers that are 1 and 5 mod 6 and has segment size 420.

EDIT: Now 25-42*epsilon with prgmSIEVE10, using segment size 840 (only representing 280 numbers per segment). So about 24.6 seconds.
EDIT: 25-72*epsilon. Used the sieve to generate small primes too, instead of trial division.

Here's my latest program. I store numbers and sieve with a wheel of 2,3; and use a wheel of 2,3,5 for checking for primes, except for the last iteration which is 2,3. All numbers divisible by 7 are marked as composite so they don't need to be sieved.


Code:
//estimated time: 24.3 seconds.
[CLASSIC]
startTmr->Tmin
Repeat checkTmr(Tmin:End
7907->N
840->B    //real segment size
280->C    //stored segment size
Disp checkTmr(Tmin

//make a boolean list L2, stored in L6, to sieve on
//represent only 1,5 mod 6
//{1,5,7,11,13,17,etc.
70->dim(L6
Fill(1,L6
For(I,5,7,2    //5,7
   For([recursiven],1+int(I/3),70,2I
      0->L6([recursiven]
    End
    For([recursiven],1+int(5I/3),70,2I
      0->L6([recursiven]
    End
End
//now make 4 copies of L6, for segment size 840
augment(L6,L6
augment(Ans,Ans->L6

//make a list L3 of small primes, less than sqrt(N). Don't include 2, 3, 5, 7.
//works for sqrt(N)<121. Otherwise add checks by 11,...
For([recursiven],4,sqrt(N)/3,2:
   If L6([recursiven]
   3[recursiven]-1->L3(1+dim(L3
   If L6([recursiven]+1
   3[recursiven]+1->L3(1+dim(L3
End

//now make a list of their starting points, i.e. their squares
//make a second, larger list of starts too for the other residue mod 6
//240 is subtracted from them every time
L3^^2->L4
L4+L3*(6-2remainder(L3,3->L5
1+int(L4/3->L4    //now adjust to the correct positions for wheel
1+int(L5/3->L5
2L3->L3    //double the primes
Disp checkTmr(Tmin

//11,13

{2,3,5->L1    //next "prime" found is 1; will replace with 7 at end
//first A is 210
For(A,0,N-B,B //for each bucket
    //Disp A          //bucket goes from A+1 to A+840
    L6->L2         //sieve list
    For(I,1,dim(L3
        For([recursiven],L4(I),280,L3(I   //zero all multiples
            0->L2([recursiven]
        End
        [recursiven]->L4(I
        For([recursiven],L5(I),280,L3(I   //zero all multiples
            0->L2([recursiven]
        End
        [recursiven]->L5(I
    End
    L4-C->L4
   L5-C->L5
   //Disp 1
   dim(L1->nMin
   L1(nMin->PMT
   L2(1)+nMin->L2(1
   cumSum(L2->L2
   L2(C->dim(L1     //largest cumulative count
   A+B+1
    For([recursiven],C,1,~10
      Ans-2->L1(L2([recursiven]    //29
      Ans-6->L1(L2([recursiven]-2  //23
      Ans-4->L1(L2([recursiven]-3  //19
      Ans-2->L1(L2([recursiven]-4  //17
      Ans-4->L1(L2([recursiven]-5  //13
      Ans-2->L1(L2([recursiven]-6 //11
      Ans-4->L1(L2([recursiven]-7 //7
      Ans-6->L1(L2([recursiven]-9 //1
    End
   PMT->L1(nMin
End
//finally, finish up the last bucket
L6->L2
1+int((N-A)/3->PV
PV->dim(L2
For(I,1,dim(L4
        For([recursiven],L4(I),PV,L3(I   //zero all multiples
            0->L2([recursiven]
        End
        For([recursiven],L5(I),PV,L3(I   //zero all multiples
            0->L2([recursiven]
        End
End
//Disp 1
dim(L1->nMin
L1(nMin->PMT
L2(1)+nMin->L2(1
cumSum(L2->L2
L2(PV->dim(L1
N+2
If 1=remainder(N,6
N->L2(L1(L2(PV
For([recursiven],2int(PV/2),1,~2   //start on a 5 (even n)
    Ans-2->L1(L2([recursiven]   //5
    Ans-4->L1(L2([recursiven]-1   //1
End
PMT->L1(nMin
7->L1(4        //replace the 1 with a 7
startTmr->Tmax
For([recursiven],0,|E9:
   If startTmr=Tmax
End
Disp Tmax-Tmin,[recursiven]
1/(sum(L1)=3674994)
If you need more speed use a LUT.
ProgrammerNerd: What is there to put in the LUT? I don't bother with numbers divisible by 2 and 3, and when checking for candidate primes I skip numbers divisible by 7 as well. Storing the 48 candidate primes modulo 210 in a LUT was done in my 42s program, but I found a faster algorithm that makes that impossible now.

If you mean just hardcoding a list of 999 primes, that would be cheating.
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are GMT - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement