Where the cryptography is at.
(cryptography)
"Two more weeks" jokes aside, some progress has been made, and we're "close" for some values of close.
I'll try to ballpark the numbers and keep things minimal for easy reading.
We're working with algorithm b292, which I've never explained in great detail, or shared.
Until now I've also kept some of the critical code on a device that is stored in a separate physical location in the event anyone wanted to try and steal a copy when I was at work or out. I've also kept both components in an intentional state of disarray, and left them undocumented for this reason as well. Basically, unless you have explicit instructions, the code is near incomprehensible and unrunnable, even with both components now being stored in the same location.
Lets get to it then.
We start by calculating using another algorithms internal variables and calculating new values,
based on the sequence 2, 4, 9, 49, 64, 256, 289, 922, namely
(_d42/((int(((ceil(a1/(d4a/_d42))x(d4a/_d42))x_d42)/d4a))-(1+d(1/2))x(1-((d(1/4))-(d(1/9))+(d(1/49)xi)-(d(1/64)xd(1/i))+(d(1/256)xd(i))-(d(1/289)xd(1/i))+(d(1/d(922))xd(i))))))
I've replaced asterisks for multiplication with x, so that the forum doesn't fuck up the formatting.
You don't need to know what a1, d4a, _d42 are. 'i' is an iterative formula, we're generating a set of test numbers.
a1 is a number representing the leading digit of p and its magnitude in n=pq where p<q. This is whats called a semiprime, and
while this is not how RSA keys are calculated, finding a fast way to factor semiprimes can be applied to rapidly factoring RSA.
I programatically tested on known keys, out to about a hundred million RSA semiprimes to determine the mean and standard deviation needed to set how many iterations I needed for this loop. It turns out the number is around 75. But I digress.
From here we calculate a series of numbers
d5 = actual[(min(results)[1])][0]
a5 = d4a/d5
(d5+(d5-(abs(d5-(a5x(((d4_omega(a5+h))-d4_omega(a5))/h)))/6)))/d('1.66666')
d6x = ((d5+(d5-(abs(d5-(a5x(((d4_omega(a5+h))-d4_omega(a5))/h)))/6)))/d('1.66666'))
d6y = ((d5+(d5-(abs(d5-(a5x(((d4_omega(a5+h))-d4_omega(a5))/h)))/7)))/d('1.66999'))
d6z = ((d5+(d5-(abs(d5-(a5x(((d4_omega(a5+h))-d4_omega(a5))/h)))/9)))/d('1.777'))
Most of these numbers were searched for and tuned with automatic scaffolding code, so why they work is anyones guess, but this was an intuitive process, involving a lot of sleepless nights, and 16 hour days back when I had that sort of time, and some of it was even manual before I automated it, a careful inching forward, grinding away, moving through a moonless night, following only the inkling of a hunch of an idea, till I'd stumble on another piece of the formula, another interesting thread or piece of data, trying combinations by hand until exhausted, testing manually, writing code to search, and then automate the testing, each time and each new find requiring me to start this entire process over again.
I went days without sleep, sometimes without eating. And at the point of total exhaustion, maybe just to find one small new bit, or maybe a big leap, and have the obsession be totally renewed. Like staring into the abyss, the face of madness, and not wanting to blink.
From here we use linear programming on a series of formulas involving these, and calculate first order derivatives to find a saddle point in the set of those equations.
Once we have that, we have to solve for a series of variables that look like this
f1 = (dfloor(((((((d8/abs(1/(((d8/((d8+h)-d8)) - (d8/((1)-d8)))/h)))-(d8/(1-d8)))x1))).sqrt()/d8) / ((((((((d8/abs(1/(((d8/((d8+h)-d8)) - (d8/((1)-d8)))/h)))-(d8/(1-d8)))x1))).sqrt()/d8) -(((((((d8/abs(1/(((d8/((d8+h)-d8)) - (d8/((1)-d8)))/h)))-(d8/(1-d8)))x1))).sqrt())/d4)))))
The problem? See that variable 'd4'? Having that lets you take a known variable, calculated only from the semiprime itself (n), and directly, and precisely derive the factors or in the case of RSA, keys. And d4 happens to be in a set of 'unknowns' that are, surface-level, equivalent to knowing the factors. So its a no-go.
Well it turns out, when you have the right estimate of 'a' or 'p', trivially calculable by iterating 2-9, 10-90, 100-900, etc up to the square root of n (or starting at the square root of n and working backwards, because most RSA keys tend to be close
together in value, and so will be close to the root typically)--it turns out with the correct estimate, and saddle point, f1 is equivalent to a variable called k, which normally we have to search for.
Theres a set of these, f1, f2, f3, f4, f5, f6, f7, f8, and k2.
The dependancy graph for these looks like this:
d4: f1, f2, f4, k2,
f1: f2, f3, f4, k2, f6, f7, f8
f2: f3, f6, f7,
f3: f5
f4: f5
f5: f6
k2: f6, f7
f6: f7
For example, f3, f6, and f7, all depend on f2.
Well it turns out the formula for f2 looks like this:
f2 = dfloor((((((((d8/abs(1/(((d8/((d8+h)-d8)) - (d8/((1)-d8)))/h)))-(d8/(1-d8)))x1))).sqrt()/d8)/f1 ) / (((((((((d8/abs(1/(((d8/((d8+h)-d8)) - (d8/((1)-d8)))/h)))-(d8/(1-d8)))x1))).sqrt()/d8) /f1) - ((((((((d8/abs(1/(((d8/((d8+h)-d8)) - (d8/((1)-d8)))/h)))-(d8/(1-d8)))x1))).sqrt()/d8) -(((((((d8/abs(1/(((d8/((d8+h)-d8)) - (d8/((1)-d8)))/h)))-(d8/(1-d8)))x1))).sqrt())/d4))))))
d8 and h are already known. We have f1 now, which means we have
(((((((d8/abs(1/(((d8/((d8+h)-d8)) - (d8/((1)-d8)))/h)))-(d8/(1-d8)))x1))).sqrt()/d8)/f1 )
which can be used to partially get f2.
The formula for f4 can also be obtained, without f5 (which normally requires f4, which in turn requires f2 and d4) like so:
f4 = f6+(((((((d6/abs(1/(((d6/((d6+h)-d6)) - (d6/((1)-d6)))/h)))-(d6/(1-d6)))*1))).sqrt()/d6) /f1)
But d6 (and another variable d7) are estimates derived from k.
These estimates give us an approximate product of f2 and f4.
I thought I was stuck.
But it turns out, deriving the close approximations based on this initial sequence, for f1-f8 and a few other variables,
is enough.
And all things being equal, the result is a number a2 (or in the case of cryptography, p in n=pxq) that is just a little bit closer to the true factors of n, than our initial search.
If we plug this in as our new estimate, and run the entire formula again, caching a few variables such as the linear programming section, the next output is a little bit closer.
The formula converges precisely on the factors of n.
Theres still work to be done, because even for my hardware, it is slow.
Now the numbers are ballpark. But good enough to give a rough idea of the performance of the algorithm. And from what I've seen to break an 2048 bit RSA key takes about a 1000 quadrillion operations. Remember, ballpark numbers here.
Assuming you had a system that did 100 million steps in parralel per second, you'd be looking at about two weeks of compute time give or take a magnitude or two.
There are some obvious improvements here that I can make, such as running statistics to tighten up some of the inner loops which are wasteful, training a system to use internal variables to give better initialization values to some variables, etc, but its not bad.
but thats the gist of it.
Things left to do are way more testing, build a bigger dedicated compute cluster, run a ton more statistics and write tuning code for some parameters. And of course finish and integrate the UI module.
I think I can cut in half the run time for generating the test set for finding the saddle point, although the saddle point algorithm has already been statistically optimized, and cut in half the amount of equations in the linear algebra section.
I think I can use the existing sub-algorithms mentioned elsewhere, to better decide, possibly
without any iterations, precisely what some of the initializing variables should be, without
direct inner-loop iteration.
I want to get it to where it doesn't take two weeks on a big cluster to break a key, and I think thats very doable at this time.
I'm working on the funding problem separately to build out the current cluster, and move to
full time research, but that doesn't concern anyone here and I'll make a post when I have
that solved.
And thats where the project is at.