I wrote a TI-BASIC program to find the first 999 primes, and I'm trying to optimize it for speed. This version using trial division takes 100 seconds on jsTIfied with 2.55MP.

Code:
```startTmr->T Repeat checkTmr(T:End 7907->N   //upper limit {5->L2     //L2 is small primes; L1 is all {2,3,5->L1    //2 and 3 already checked by mod 6 1->I 5->P While P<=N    For(P,P,min(N,L2(I)^^2),6       If min(remainder(P,L2       P->L1(1+dim(L1       If min(remainder(P+2,L2         P+2->L1(1+dim(L1     End     Repeat P<L1(I)^^2-2   //repeat until the square of largest is large enough         I+1->I         L1(I+2->L2(I   //remember 2 and 3     End End Disp checkTmr(T 1/(max(L1)=7907 and dim(L1)=999)   //"assert" list is correct```

jonbush has a solution in 45 seconds using a segmented sieve of Eratosthenes, and I've heard earthnite has a faster one in 29 seconds. I'll also switch to a sieve to see how fast it can possibly be.

EDIT: Replaced P with [recursiven]; now 98 seconds.
Second attempt with a segmented sieve with segment size 210: 76 seconds. It can still be improved significantly, as I haven't done any wheel sieving yet.

EDIT: 69 seconds.
EDIT 2: 57 seconds.
EDIT 3: Now 54 seconds.

Code:
```startTmr->T Repeat checkTmr(T:End 7907->N 210->B    //segment size //first, make a list L3 of all 48 mod-210 residues of primes //they go from A+11 to A+211 //for all of them to be at most N //this takes 2 seconds 0->dim(L3 For([recursiven],1,B,2:     If min(remainder([recursiven],{3,5,7     [recursiven]->L3(1+dim(L3 End Disp checkTmr(T //now make a list L4 of small primes, less than sqrt(N). Don't include 2, 3, 5, 7. //takes 1 more second {11,13->L4 For([recursiven],13,sqrt(N),2:     If min(remainder([recursiven],{3,5,7,11,13     [recursiven]->L4(1+dim(L4 End //now make a list of their starting points, i.e. their squares //time is negligible max(L4+210,L4^^2->L5 11*21->L5(1 13*17->L5(2 Disp checkTmr(T //finally, make a list L2 to sieve on //list goes from A+2 to A+B+1 B->dim(L2 augment({2,3,5,7},L4->L1  //L1 is all primes //fill out the first bucket, from 11 to 211 For([recursiven],[recursiven],B,2:     If min(remainder([recursiven],{3,5,7,11,13     [recursiven]->L1(1+dim(L1 End Disp checkTmr(T For(A,B,N-B,B     //for each bucket, but not the first or last     Disp A          //bucket goes from A+1 to A+B     Fill(1,L2     For(I,1,dim(L4         For([recursiven],L5(I)-A,B,2L4(I   //zero all multiples             0->L2([recursiven]         End         [recursiven]+A->L5(I     End     Disp 1     For([recursiven],1,dim(L3:         If L2(L3([recursiven]         A+L3([recursiven]->L1(1+dim(L1     End End //finally, finish up the last bucket     Disp A          //bucket goes from A+1 to A+B     Fill(1,L2     For(I,1,dim(L4         For([recursiven],L5(I)-A,B,2L4(I   //zero all multiples             0->L2([recursiven]         End         [recursiven]+A->L5(I     End     Disp 1     For([recursiven],1,sum(not(cumSum(L3+A>N:         If L2(L3([recursiven]         A+L3([recursiven]->L1(1+dim(L1     End Disp checkTmr(T 1/(sum(L1)=3674994)```

EDIT 4: My program is now about the same speed as jonbush's: 43s on jsTIfied 2.55MP. I didn't account for differences in calculator speed.

Code:
```startTmr->T Repeat checkTmr(T:End 7907->N 210->B    //segment size //first, make a list L3 of all 48 mod-210 residues of primes //they go from A+11 to A+211 //for all of them to be at most N //this takes 2 seconds 0->dim(L3 For([recursiven],1,B,2:     If min(remainder([recursiven],{3,5,7     [recursiven]->L3(1+dim(L3 End Disp checkTmr(T //now make a list L4 of small primes, less than sqrt(N). Don't include 2, 3, 5, 7. //takes 1 more second {11,13->L4 For([recursiven],13,sqrt(N),2:     If min(remainder([recursiven],{3,5,7,11,13     [recursiven]->L4(1+dim(L4 End //now make a list of their starting points, i.e. their squares //time is negligible max(L4+210,L4^^2->L5 11*21->L5(1 13*17->L5(2 Disp checkTmr(T //finally, make a list L2 to sieve on //list goes from A+2 to A+B+1 B->dim(L2 augment({2,3,5,7},L4->L1  //L1 is all primes //fill out the first bucket, from 11 to 211 For([recursiven],[recursiven],B,2:     If min(remainder([recursiven],{3,5,7,11,13     [recursiven]->L1(1+dim(L1 End Disp checkTmr(T For(A,B,N-B,B     //for each bucket, but not the first or last     //Disp A          //bucket goes from A+1 to A+B     Fill(1,L2     L5-B->L5     For(I,1,dim(L4         For([recursiven],L5(I),B,2L4(I   //zero all multiples             0->L2([recursiven]         End         [recursiven]->L5(I     End     //Disp 1     For([recursiven],1,dim(L3:         If L2(L3([recursiven]         A+L3([recursiven]->L1(1+dim(L1     End End //finally, finish up the last bucket     //Disp A          //bucket goes from A+1 to A+B Fill(1,L2 L5-B->L5 For(I,1,dim(L4    For([recursiven],L5(I),B,2L4(I   //zero all multiples         0->L2([recursiven]     End End //Disp 1 For([recursiven],1,sum(L3+A<=N:     If L2(L3([recursiven]     A+L3([recursiven]->L1(1+dim(L1 End Disp checkTmr(T 1/(sum(L1)=3674994)```

It may be considered cheating that I've hardcoded the primes up to 13, but if that's an issue it can be easily fixed.

EDIT 5: 42s by pre-doubling L4 and by replacing A and B with finance variables. I don't know how else I can optimize it. Bucket size 420 speeds up the algorithm by 3s, but it also slows down the first bucket by 3s, so it breaks even.

The code is pretty big too: 507 bytes. It's a long way to earthnite's 29sthe part of mine that stores the already-identified primes into L1 is almost that slow by itself.
Apparently I wasn't using [recursiven] in the last saved version I have, so now it is down to 40s. My program currently seems kind of haphazardly patched together, so there might be a better way to do things. It is 495 bytes.
I think I could get my program down to 40s or 39s by using a wheel of 2,3 for sieving rather than just odds. This would be done by using L6 as a second list of multiples of L4, making these the 5 mod 6 multiples, and using L5 for 1 mod 6.

The bottleneck is not sieving, though; it's the prime-storing phase. That takes about 27 seconds of my 42 (0.7 seconds or so per iteration), so it's what really needs to be sped up. An explanation of it: L3 is the list of numbers mod 210 that can be primes. L2 is a boolean list of whether {A+1,...,A+210} are prime. I loop over L3, and whenever L2(L3(n)) shows there is a prime, I add it to the list. The following code runs 37 times, with dim(L3) equal to 48:

Code:
```For([recursiven],1,dim(L3:     If L2(L3([recursiven]     A+L3([recursiven]->L1(1+dim(L1 End```

It is possible to use SortA( rather than so many If statements to detect primes. That way, all the primes are moved to the back of the list and are contiguous.

Code:
``` //earlier, do {A+1,A+2,...A+210->L2, zero where non-primes SortA(L2 sum(not(L2->PV dim(L1)-PV->PMT For([recursiven],PV+1,dim(L2 A+L3([recursiven]->L1(PMT+[recursiven] End```

However, this presents some problems.

* SortA( is O(n^2), so it is much slower on larger lists. On a list of 210 elements it takes at least 3 whole seconds, which is unacceptable. Even on a list of
* Due to poor implementation, SortA( is slower on lists which have many repeated elements. This is a problem, since we're zeroing the non-prime indices, causing many elements to be zero.

The first problem can be mitigated by storing only the odd numbers, in a 105-element list. This causes the sorting time to be smaller, about 0.65s, which is still not fast enough. It can be further sped up by breaking up the list of 105 into two lists of 53 and 52, then sorting each individually for a total of 0.17s * 2 = 0.35 s. However, this will incur greater parser overhead in the rest of the program, so I'm not sure if it is fast enough.

As for the second problem, the values stored into L2 to mark non-primes must be somehow varied. Simply calling rand is much too slow (15 ms per call where we need ~1). I'm not sure how to solve this; maybe precomputing a list of random numbers would be a good idea.
I have a program, 660ish bytes including the timing code, that finishes in 30-34*epsilon seconds. By that I mean that the output of this:

Code:
```startTmr->Tmin Repeat checkTmr(Tmin:End //program startTmr->Tmax For([recursiven],0,|E9:    If startTmr=Tmax End Disp Tmax-Tmin,[recursiven]```

is 30 \n 34.

The value of epsilon in seconds seems to change depending on conditions that I can't pin down, so I'm just using it to compare different iterations of a program.

EDIT: Now 29-26*epsilon.
EDIT: Now 29-53*epsilon with prgmSIEVE9.
I have to take a break for now to do actual things, but I have 38s at the moment. This version still represents the even numbers, but it ignores them and all of the numbers eliminated by the wheel sieve during the elimination and screening parts. The implementation is less ideal than I would like, but it works decently. I can't think of anything to make this method better, but I imagine that only representing odd numbers like a Sieve of Sundaram, could yield increased performance by allowing for much larger segments. It is currently 522 bytes.

Code:
```startTmr->theta Repeat checkTmr(theta End {2,3,5,7->L3 210->dim(L2 Fill(1,L2 For(A,3,7,2    For([recursiven],A,210,2A       0->L2([recursiven]    End End L2->L1 {1->L5 For([recursiven],5,208,6)    If L2([recursiven]    [recursiven]->L5(1+dim(L5    If L2([recursiven]+2    [recursiven]+2->L5(1+dim(L5 End 209->L5(48 For(A,11,13,2    For([recursiven],A^^2,210,2A       0->L1([recursiven]    End End For([recursiven],2,48)    If L1(L5([recursiven]    L5([recursiven]->L3(1+dim(L3 End augment(210+L5,420+L5->L6 For(S,210,7770,630    S->PMT    sqrt(Ans+630->T    137+493(PMT<7770->V    augment(L2,augment(L2,L2->L1    5->PV    L3(Ans->|N    Repeat T<Ans       If Ans^^2>PMT       Then          Ans^^2-PMT          Else          remainder(Ans+Ansint(S/Ans),PMT          Ans+|Nnot(remainder(Ans,2       End       For([recursiven],Ans,V,2|N          0->L1([recursiven]       End       PV+1->PV       L3(Ans->|N    End    For([recursiven],1,31+17(V=630:       If L1(L5([recursiven]       PMT+L5([recursiven]->L3(1+dim(L3    End    If PMT<7770    Then       For([recursiven],1,96)          If L1(L6([recursiven]          PMT+L6([recursiven]->L3(1+dim(L3       End       0->dim(L1    End End checkTmr(theta+1```

Side note: On systems with a CPU cache, it is far better to restrict the size of segments to smaller than the cache size so that the program does not have to continually access the RAM during execution. I noticed that this had a major effect when I generated my list list of the first 100 million prime numbers (clicking the link will download a fairly large file).
There are diminishing returns to having larger segments. Imagine that setting up a segment takes 0.2s of work. If you had segments of size 210 you need to do this 35 times, meaning 7.0s of work. Moving to segments of 420 makes this 3.5s, so you've saved 3.5s. Doubling again to 840 saves you another 1.7s, but then going all the way to 7*210=1470, possible by representing only odds, saves only another 0.7s.

By the way, my programs take 7907 (the number being searched to) as a parameter, and also have modifiable segmenr size. The latest one represents only numbers that are 1 and 5 mod 6 and has segment size 420.

EDIT: Now 25-42*epsilon with prgmSIEVE10, using segment size 840 (only representing 280 numbers per segment). So about 24.6 seconds.
EDIT: 25-72*epsilon. Used the sieve to generate small primes too, instead of trial division.

Here's my latest program. I store numbers and sieve with a wheel of 2,3; and use a wheel of 2,3,5 for checking for primes, except for the last iteration which is 2,3. All numbers divisible by 7 are marked as composite so they don't need to be sieved.

Code:
```//estimated time: 24.3 seconds. [CLASSIC] startTmr->Tmin Repeat checkTmr(Tmin:End 7907->N 840->B    //real segment size 280->C    //stored segment size Disp checkTmr(Tmin //make a boolean list L2, stored in L6, to sieve on //represent only 1,5 mod 6 //{1,5,7,11,13,17,etc. 70->dim(L6 Fill(1,L6 For(I,5,7,2    //5,7    For([recursiven],1+int(I/3),70,2I       0->L6([recursiven]     End     For([recursiven],1+int(5I/3),70,2I       0->L6([recursiven]     End End //now make 4 copies of L6, for segment size 840 augment(L6,L6 augment(Ans,Ans->L6 //make a list L3 of small primes, less than sqrt(N). Don't include 2, 3, 5, 7. //works for sqrt(N)<121. Otherwise add checks by 11,... For([recursiven],4,sqrt(N)/3,2:    If L6([recursiven]    3[recursiven]-1->L3(1+dim(L3    If L6([recursiven]+1    3[recursiven]+1->L3(1+dim(L3 End //now make a list of their starting points, i.e. their squares //make a second, larger list of starts too for the other residue mod 6 //240 is subtracted from them every time L3^^2->L4 L4+L3*(6-2remainder(L3,3->L5 1+int(L4/3->L4    //now adjust to the correct positions for wheel 1+int(L5/3->L5 2L3->L3    //double the primes Disp checkTmr(Tmin //11,13 {2,3,5->L1    //next "prime" found is 1; will replace with 7 at end //first A is 210 For(A,0,N-B,B //for each bucket     //Disp A          //bucket goes from A+1 to A+840     L6->L2         //sieve list     For(I,1,dim(L3         For([recursiven],L4(I),280,L3(I   //zero all multiples             0->L2([recursiven]         End         [recursiven]->L4(I         For([recursiven],L5(I),280,L3(I   //zero all multiples             0->L2([recursiven]         End         [recursiven]->L5(I     End     L4-C->L4    L5-C->L5    //Disp 1    dim(L1->nMin    L1(nMin->PMT    L2(1)+nMin->L2(1    cumSum(L2->L2    L2(C->dim(L1     //largest cumulative count    A+B+1     For([recursiven],C,1,~10       Ans-2->L1(L2([recursiven]    //29       Ans-6->L1(L2([recursiven]-2  //23       Ans-4->L1(L2([recursiven]-3  //19       Ans-2->L1(L2([recursiven]-4  //17       Ans-4->L1(L2([recursiven]-5  //13       Ans-2->L1(L2([recursiven]-6 //11       Ans-4->L1(L2([recursiven]-7 //7       Ans-6->L1(L2([recursiven]-9 //1     End    PMT->L1(nMin End //finally, finish up the last bucket L6->L2 1+int((N-A)/3->PV PV->dim(L2 For(I,1,dim(L4         For([recursiven],L4(I),PV,L3(I   //zero all multiples             0->L2([recursiven]         End         For([recursiven],L5(I),PV,L3(I   //zero all multiples             0->L2([recursiven]         End End //Disp 1 dim(L1->nMin L1(nMin->PMT L2(1)+nMin->L2(1 cumSum(L2->L2 L2(PV->dim(L1 N+2 If 1=remainder(N,6 N->L2(L1(L2(PV For([recursiven],2int(PV/2),1,~2   //start on a 5 (even n)     Ans-2->L1(L2([recursiven]   //5     Ans-4->L1(L2([recursiven]-1   //1 End PMT->L1(nMin 7->L1(4        //replace the 1 with a 7 startTmr->Tmax For([recursiven],0,|E9:    If startTmr=Tmax End Disp Tmax-Tmin,[recursiven] 1/(sum(L1)=3674994)```
If you need more speed use a LUT.
ProgrammerNerd: What is there to put in the LUT? I don't bother with numbers divisible by 2 and 3, and when checking for candidate primes I skip numbers divisible by 7 as well. Storing the 48 candidate primes modulo 210 in a LUT was done in my 42s program, but I found a faster algorithm that makes that impossible now.

If you mean just hardcoding a list of 999 primes, that would be cheating.

Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

»
» All times are GMT - 5 Hours

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum