"Two more weeks" - Upgoat - "where a goat can be a goat"

"Two more weeks"
submitted by prototype to cryptography 1 monthMar 11, 2025 22:18:15 ago (+7/-0) (cryptography)

First the new architecture

https://files.catbox.moe/1tygoo.png
https://files.catbox.moe/n308f1.png

Working on cryptography and got one of the last major modules functioning.
Summary:
The central insight is that you can generate approximations of factors using variations on modular arithmetic that will be accurate to the leading digits of a semiprime's lowest factor. Then, by applying various constraints, you can dramatically reduce the search space.

The algorithm's efficiency comes from the fact that the modular set grows much more slowly than the bit length of the semiprime, making it vastly more scalable than traditional factorization methods.

This approach bridges number theory, machine learning, and optimization techniques in a novel way to solve the prime factorization problem.

code (partial)
https://pastebin.com/raw/DHADM0nW

Full text:
What I'm doing is using the code listed here
to generate a list of possible values of A (where the problem is p=ab, or n=pq in cryptography), and then using the boundaries established with other variables from a separate module, called d4a, ac, at, av, etc
to start to eliminate values.

ML will be needed to classify whether unknowns are
inverted (abs(unknown) < 1).

While the number of values to consider grows, assuming we know
whether an unknown is inverted we can take known
variables & use them as upper (or lower) bounds on A,
filtering significant numbers of values.

More importantly, the number of values to check in the modular
set grows much more slowly then the bitlength of the semiprime
to check.

We'll call the set of equations (d4a, at, ac, av, etc) the Algebraic Set (AS).
The set of modular values derived from p (or n) we'll call the Modular Set (MS).
The algorithms I wrote in the 290+ set, we'll call the Convergent Set (CS).
The machine learning model that classifies whether unknowns are likely to be inverted or not (abs(n) < 1), we'll call the Configuration Set (FS).
And the code written to generate boundary estimates we'll call the threshold set (TS).

How and why it works.
I determined the AS is indeterminate (I've checked well over a billion solutions so far, and probably on the order of several thousand novel methods to attack the problem, variations of it, and simplified versions as well, in the domains of algebra and the minimal calculus that I know), that is there is insufficient free parameters to map between the known and unknown set, and thus factor our product.

The modular set is derived from certain equations relating the various magnitudes to themselves. The MS guarantees that for some vector of values [p, u, k, i, j] some of which are trivial to estimate, there will within the set generated, always be a value that approximates A in P=AB, up to the leading 1-2 digits (and often more) and magnitude.

By taking the AS we can use the FS to derive boundaries, giving us the TS.
With the MS, we can use the TS, to efficiently derive estimates of all unknowns that have factors in common, such as and including (but not limited to) u (unknown) from d4u, ut, cu, and uv (all known).

With these estimates, we can then feed them to the CS to essentially strip away multiple outer and inner loops of the algorithm, giving us significant orders-of-magnitude improvements in convergence time.

This is one of the final pieces of the puzzle.

Current risks are:
- if the ML component can correctly identify if unknown variables in known products and quotients are inverted or not.
Preliminary testing with smaller model and datasets (<100k samples across various bitlengths) indicates this will work.

- re-groking the later basilisks which are incredibly convoluted and have to have parameter sweeps for tuning done.

- gains vanishing at very high bitlengths owing to having to convert very large numbers to logarithms, and the ML model potentially failing because of that.

Mitigations:
- Larger sample-size testing for the ML model in order to establish the threshold set's definitive viability, rather than ad-hoc testing.

- writing clear documentation and procedures on the convergent set algorithm, so I don't have to work through it every time I want to implement something related to it

- writing my current off-the-shelf neural-graph-search (NGS) library from the ground up to accept arbitrary precision floats and integers

Obstacles:

- I'm not currently versed enough in ML to rewrite the NGS, and no known library currently accepts arbitrary precision types. Most rely on numpy, and even numpys largest type support won't handle data in the 2-4k bits length.

Current delivery estimate:
"Two more weeks", or "Fuck you!"

12 comments block

sort by: hot new top old low

[ - ] Trope 1 point 1 monthMar 12, 2025 01:16:22 ago (+1/-0)

Make it happen, captain.

You’re far less dismissive of others with this post in contrast to your last one.

I believe in you.

[ + ] prototype

[ - ] prototype [op] 0 points 1 monthMar 12, 2025 05:39:21 ago (+0/-0)

You’re far less dismissive of others with this post in contrast to your last one.

Thanks trope. Sometimes its good to have people that push things along, and other times you gotta tell people that don't believe "fuck you", because doing what others believe is impossible is already hard enough. And sometimes the people that disbelieve are themselves the spite-fuel for continuing.

[ - ] albatrosv15 1 point 1 monthMar 12, 2025 00:31:44 ago (+1/-0)

Upvoated, because i love this schizo "two more weeks and the world will burn".

[ + ] prototype

[ - ] prototype [op] 0 points 1 monthMar 12, 2025 05:36:13 ago (+0/-0)

Upvoated, because i love this schizo "two more weeks and the world will burn".

ha, there is that.

There is the slim hope that they could all look at eachother's darkest organizational and national secrets, say "oh shit our nations are trapped in an iron-gripped self-reinforcing system of massive mutual bribery and blackmail leading to run-away political, social, military, and economic decay, and if this goes on we're all collectively going to crash and burn in world war three or end up in one of orwell's dystopias" and immediately about-face to fix their respective systems.

[ - ] Drstrangestgov 1 point 1 monthMar 11, 2025 22:35:40 ago (+1/-0)

[ + ] prototype

[ - ] prototype [op] 1 point 1 monthMar 11, 2025 23:02:31 ago (+1/-0)

lol.

[ - ] titstitstits 1 point 1 monthMar 11, 2025 22:31:45 ago (+1/-0)

i havent done this kind of stuff in a long time so i didnt understnad it even with chatgpt. you wrote this? what are you doing?

[ + ] Drstrangestgov

[ - ] Drstrangestgov 1 point 1 monthMar 11, 2025 22:36:00 ago (+1/-0)

Nerd stuff

[ + ] prototype

[ - ] prototype [op] 1 point 1 monthMar 11, 2025 23:01:31 ago (+1/-0)

Nerd stuff

Yep, nerd stuff.

[ + ] prototype

[ - ] prototype [op] 1 point 1 monthMar 11, 2025 23:00:59 ago (+1/-0)*

you wrote this? what are you doing?

continuously disappointing a dozen or so people who are patiently waiting for it to release.

The goal is to break RSA. The rundown is that the algebraic set is generated by an algorithm that derives a whole bunch of variables purely from a public key. All these variables de facto leak information about variables in a parallel group called the unknown set (UKS). If any solution can be found between a known variable and a variable in the unknown set, then it leads to a O(1)-time exact solution to prime factorization, or a break of any public key cryptography method that relies on the difficulty of prime factorization.

I tried that for a long time, almost four years now, and wrote an entire suite of automation tools to search for solutions, but always landed on inexact solutions that didn't work for all numbers. I tried bundling those, genetic algorithms to refine them toward being universally general, but that failed as well.

It was like having all the answers (the unknown set), but all the answers were locked in impenetrable boxes (the known set)

So I figured out the algebraic set is probably indeterminate (no exact solution, or infinitely many solutions that are trivial, i.e. they don't give the private key from a public key).

From there I moved on to machine learning models, and figured out that I could take a known variable, lets say d4a, and d4u, which are products of three unknown variables [d4, a, u], and use those as boundaries for estimates.
But how? Sometimes you'd have unknowns that were below 1, meaning a boundary that might normally act as an upperbound, would act as a lower bound, and vice-versa.
Initially testing with a bag model of machine learning gave very promising results (predicting to within 95% of the unknowns value in some cases, and up to 98% in many others). Of course thats not good enough because of outliers.
But what if I didn't want to predict their values exactly? What if I just wanted to determine if some unknown that was part of a product or quotient in the known set, was inverted?

And from the testing that should be perfectly doable.

At some point I found an exact convergent solution to prime factorization, and hence a break of RSA, the caveat?

It was incredibly slow. For very large numbers? Effectively slower than bruteforce.

Unless......

Unless I knew the magnitude and leading digit or two of one of the public key's factors.

In which case I could strip off multiple inner and outer loops, massively reducing the search space to polynomial time, instead of NP or NP-intermediate.

In theory exact polynomial time equations aren't guaranteed to be really fast, because the polynomial could be some very large number, but in practice for most exact polynomial time equations, there is a non-deterministic polynomial time equation.

From what I've seen, starting with test keys where the private keys are known, convergent set converges very fast, even for very large keys in the 1024 bit range, which is what I've currently tested up to.

On a lark about a week ago, I tried something that had been in the back of my mind for a while, an implementation variation on an approach I'd turned to before, certain modular manipulations of n, which closely approximated N's (the public key's) factors, to what? To their magnitude and leading digits.

I realized pretty quickly, that all these different approaches, with the right training and architecture, could be combine into one larger algorithm, each giving a different piece of critical information that sped up the downstream steps massively.

And thats where the work is at now.

fwiw, inputting the code into 'chatAnything' is liable to tell you it is highly experimental and/or ill defined, and/or even 'impossible' because I'm working at the limit of mathematical tools and established understanding and techniques.

Just inputting the code for the algebraic set, and asking "is there a way to get from n to its factors, without bruteforcing, assuming xyz derived variables" and the answer will be a universal no, or the machine will hallucinate, or assume some variable must be zero (even if you tell it otherwise) in order to come up with a trivial (non-useful) solution.

All the individual pieces though work as intended. The algebraic set leaks information about the private keys and in training gives us boundaries for the threshold set. The convergent set does in fact converge in testing on training examples and validation data sets. The modular set always yields the magnitude and leading digit of at least one private key in testing, up to very long bitlengths, etc.

None of them by themselves solve factorization, but you can see, given the architecture diagram, how they would be combine, to mutually give an exact answer.

If I recall, the only component I haven't posted here at some time is the convergent set algorithm.

And while the basilisks, what I named the series, only go up to 301 currently, I've in fact tested thousands of variations over the last 4+ years, never thinking to combine any of the approaches in a major way until now.

Part of it was the fact that I lacked a precise approximation of any of the unknowns, let alone the factors of a public key, until just recently. That was the piece that made everything else click into place.

I'm currently racing to complete it, versus the various military research projects in quantum. Whichever nation builds a workable general-purpose quantum chip first, will very quickly pwn and dominate every other nation and organization the world over. If I beat all of them to a solution first, then I release it open source and in a short time every power on earth obtains it, and they all equally drag eachother down into the dustbin of history, every organization, every religion, every economic power, every business, every NGO, every nation.

[ + ] titstitstits

[ - ] titstitstits 1 point 1 monthMar 11, 2025 23:53:09 ago (+1/-0)

Unless I knew the magnitude and leading digit or two of one of the public key's factors.

are you sure you just need the magnitude to reduce the search space?

for example, if i know its 100,000,000,000,000,000 then it still has that many values to search through.

[ + ] prototype

[ - ] prototype [op] 1 point 1 monthMar 12, 2025 05:21:41 ago (+1/-0)*

for example, if i know its 100,000,000,000,000,000 then it still has that many values to search through.

Think of it as searching 1-9, then 10-90 (without searching the ones), 100-900 (without searching tens or ones again), etc. 10-90, you're not searching say 30, 31, 32, etc. You're searching 10, 20, 30, 40, 50, etc.

are you sure you just need the magnitude to reduce the search space?

off the top of my head, a 2048 bit key has 650-ish digits. That works out to 6500 iterations or so.

Each iteration has to go through a linear programming section thats 256 iterations (finding four configurations of four equations using four distinct variables)

There may be some other steps in between because its been so long since I've looked at that particular module. But most of them can be stripped out with a good approximation of the factors as input.

There is then a secondary sequence that generates branches, all of which we have to run the final step of convergence algorithm on. I programmatically determined the effective range, regardless of bit length is somewhere between 15 iterations and 170~ steps to balance run time versus the bitlength of the input.

With the leading digit and magnitude of one of the factors of the key, we can strip three out of four of these steps because early convergence tests allow us to determine for example in the linear programming section, whether we have selected the correct configurations of equation and variables. Likewise having an approximation of the factor lets us determine which subset of the branches is closest, and contains the most probable candidate for convergence to the factors of n.

Without a good approximation going in, theres no way to determine if the algorithm is converging on the factors per se, and not some other number, until at which time the answer is at most the length of the root of the public key (thats the limit, because the largest factor will always be above the root, and the smallest factor, below the root or equal to it in the absurd case of factoring a perfect square).

Without a really good guess, or a really good approximation as input, the algorithm is blind. It's about getting from a 'close' approximation (which can still be a gargantuan and unsearchable space), to an exact output. The modular set gives us that, and is what yields the approximation needed for the rest to work.

Sign into an existing account

"Two more weeks" submitted by prototype to cryptography 1 monthMar 11, 2025 22:18:15 ago (+7/-0) (cryptography)

"Two more weeks"
submitted by prototype to cryptography 1 monthMar 11, 2025 22:18:15 ago (+7/-0) (cryptography)