We're in the homestretch boys. Cryptography goes VOOM. - Upgoat

We're in the homestretch boys. Cryptography goes VOOM.
submitted by prototype to whatever 2 monthsMar 26, 2025 02:53:30 ago (+6/-0) (whatever)

tl;dr

The modular set aproximator generates candidate factors of a key, to their exact magnitude and leading
digit(s). Sufficient for the precise convergent algorithm component of the system.
It also does this in about 1/2 to 1/3rd the number of attempts as bruteforce.

For example, if you wanted to generate all numbers under the root of the integer form of a semiprime
you might start with 1-9, then 10, 20, 30, 40..., then 100, 200, 300, and so on for all lengths
of number.

So for example, for a 325 digit, or 1028 bit key, you might be looking at 2925 candidate values,
and they've be all over the place beneath the square root of the semiprime in our simplified example
(because RSA is a little more than simply multiply two numbers together, involving something called
modular exponentiation which I won't go into, but fast factorization of arbitrarily large semiprimes
breaks it anyway).

The modular set algorithm has a non-trivial number of candidates that break this distribution, with a more-than-average number that aproximate the keys that we are trying to target,
whereas the bruteforce approach will only ever have a single candidate, making the bruteforce
method's candidates indistinguishable and thus useless unless you have a way of greatly
winnowing them.

This is important because the last piece of the puzzle, the convergent algorithm, has a performance worse
than bruteforce when it is given a bad initializing approximation of the factor we are attempting to
precisely converge on, and there is no way to tell it isn't converging on a correct answer, until it
has run completely and returns an answer that doesn't work.

So by attaining a candidate list through the modular algorithm, we enormously shrink the run time of the final step in the key factorization process.

Additionally, the candidate list generator, its size and performance is the least-optimized version, implementing nearly no meaningful filtering at this time. In practice during testing convergence in the final step can be sped up tremendously using the more precise versions of the modular algorithm which typically return both the correct magnitude of the target factor, and 2-4 of the leading digits.

What follows is my always-janky research logs for those who want to dig through them.

RESEARCH LOG
research 3.17.2025, 3.18.2025

I succeeded in getting a very fast convergent aproximation of a semiprimes factors.
Highly insensitive to bit length.
The next step is to test over a wide bit range, and/or over lots of semiprimes
in the higher bitranges (RSA-worthy).

NOTE

So if we assume the new modified version requires a loop of 21x21 (-10 to +10 for i and j),
to guarantee one set of candidates has a good aproximation, we're still looking at
say a million or so candidates per key (for a 4096 bit rsa key).

Which means we're ready to develop the filtering model.

THE PLAN as of 3.18.2025

Next steps would be to
- write a front facing interface
- generate a lot of primes and semiprimes over night
- check if 1. their candidate passes a minimum closeness threshold (1.0 or 1.00 closeness being the minimum) or not
- try randomization of kiju as well
- write a graphical heatmap that maps k (optional) to average closeness of output to the target factor
- do the same for i to closeness
- do the same for j to closeness
- calculate the standard deviation of the set of all candidate values quotients (CQs)
- likewise for i values
- likewise for j values
- optionally k values as well

So for example, what we'd do is pull all unique i values, and their individual associated CQs lists for each individual value of i.
if i has three instances of 5, we'd pull all candidate entries where i=5, and append them to one list.

What we're looking for between variables (k, i, j, u) is the variable
with the lowest std among its CQs.
We want a low std, but we also want CQ values that are on average low in the absolute sense.

Average would be better here, because large outliers across kiju values are likely to produce higher averages,
indicating volatility in the parent variable, be it k, i, j, or u.

ALTERNATIVELY
Ignore all this shit above and..

- Generate a bunch of factors and semiprimes
- vary kiju randomly
- in the morning, for each unique value of i, j, and u that appears get some kind of score
- lower scores are probably going to be better

You can go for average, where you sum the CQ's of each value of say i, and then divide by the number summed
or you can go by the simple sum, and return the unique value of i that has the lowest sum.

we could also go with a genetics approach, where failure to find a close enough candidate kills the kiju.
and then we'd find the range of kiju that most reliably find a candidate.

or we could add a simple score for each that succeeds.
the smallest-sum model though probably works better because it doesn't just penalize for being wrong,
but for magnitude of wrongness.

After finding the best scoring values for each individual variable of kiju,
then we'd look to see what combinations of the individuals keeps those scores.

RESEARCH 3.18.2025

Waking up to do research. Wrote a good gameplan for todays research.
Today is all about investigating the kiju vectors, and understanding
how they are related.

It was clever to use dictionaries, with embedded lists, something
I learned from earlier projects, instead of endlessly embedded lists.

By adding values to keys in dictionaries, I can also assess how often
that key showed up, which allows me to do things like get averages.

I may or may not continue to filter values by smallest candidate
(has the leading digital string '1.00' for example).

Things to try:
- optimize the constants to something other than pi in
((len(r)(1/(pi0.5))) + (len(r)(1/(pi0.5))))/2
so that we maximize convergence speed and precision, while minimizing the kiju search space
- fix the try-except bug in test_kiju
- alternatively note total tries versus successes, and measure kiju vector fitness based on this
- continually try to increase precision to see what that does
- try to find a high precision that nevertheless converges fast and/or minimizes the search space range of the kiju vector variables
- build a visualizer to look at the clustering, std, averages, heatmap, of the data
- externalize the len(r) calculation to fix the try-except bug
- add a fitness score for fast convergence (low values in kiju, and/or not having to search a big candidate list)

All of these amount to parameter sweeping during training, tuning the parameters to optimize search.
u still remains fixed in the vector, as its been found to be optimal enough, the first parameter to be optimized to begin with.

Open questions:
If we select the individual optimal values from k, i, j, to minimize
their respective averages or even stds, will combining them also
result in a universally low value or is there some inherent tradeoffs between them?
Are some values of i insensitive to k, k insensitive to j, j insensitive to i, etc?
Can we get high mutual insensitivity that nevertheless optimizes convergence (high average precision, low search time/early detection
in a candidate list?)

NOTE 2

I generated values at 256 bit, or 512 bit rsa public keys.

Then I used
for entry in iz:
print(f"entry: {entry}, value: {iz[entry]}, len: {len(iz[entry])}")

And noticed the length of the outputs. What we might do is take all
keys with output lengths over 1, add their k values together, negative or positive as it may be,
then average them, and if its negative, to get the positive range we multiply by -4, and
to get the negative range, multiply by 4.
If its positive, multiply the average by 4 to get the upperrange, and -4 to get the negative range.

Data for iz:
-53 len: 3
-39 len: 2
-30, len: 2
-52 len: 3
42 len: 2
35 len: 2
-6 len: 2
-48 len: 2
4 len: 2

NOTE 3

For k, i, and j, the lower the average, either the more empty elements are present in that variables data,
or the smaller the sum.

In the case of an average close to 1, that means the parameter is relatively insensitive,
not producing lots of wild swings in value.

When it is far from 1 (>1.5, or < 0.5) that means most values are empty,
and values concentrate into lists on only a few indexes, indicating possibly magic numbers
or good starting points

NOTE 4

After testing I determined the following...

323/1000, success: 98.45%, 4 misses.
rsa public key size: 1024 bits.

Works out to about 1.25 misses per 100 rsa keys.

For 2048 bit rsa public keys, accuracy hovers around 85-90%

kij, or xyz vector is -1, -3, 1

another
i:134/1000, hits/total: 0.9627, misses: 4, avg: 0.9182859627497632
xyz: -1, -3, 1

Research 3.26.2025

I managed to get a UI running for the mod verifier and ran the process
with another machine to be sure. The kiju vector [-1, -3, 1] worked at
high bitlengths (2048 bit) for almost all cases (appeared to be converging to 100% before test
was stopped for lack of time).

The test consisted of running the candidate generation script
on a separate machines.
The public key was shared, transferred onto the candidate-generation machine,
candidates were saved to text file as 'data.txt', and then sent back
to the machine the verifier was running on.
The verifier confirmed one or more matching candidates were found in the set.
The sets are about 1/3rd the size of the number of candidates you'd
need to check if you checked all magnitudes with all leading digits, i.e.
1-9, 10-90, 100-900, etc.

The verifier did not attempt to see if closer matching candidates existed
in the set, only stopping on the first one found.

One way of doing this is to simply reverse the checks, checking for
the first 4 matching characters, or then the first 3, or then
the first 2, or then the first one.

We could generalize it to find the best possible match at some point.

I'm also thinking I should keep the load_data button and copy_btn that
copies the keys to clipboard.
An 'export all' button and an 'export public key' button would be useful.

Putting text labels '1.' then '2.' above each button would
better help the process, allowing users to start over, instead of
having to restart and regenerate the keys from scratch.

I need to test the system on the root-filtered candidates as well.

I should write a verifier with an embedded candidate generator
so I can test on my system directly, without having to go back
and forth between text files.

I'd probably need a 'check internal candidates' or 'load
internal candidates' button for that.

A field to specify the name of the data file would be nice as well,
but optional.

After doing that, the next steps are to
1. translate element values, names, formulas, and element type (product, quotient, remainder of subtraction, and sum)
to a dictionary structure

2. write a custom export script

3. write a system that lets me pick and choose which get exported

4. write a data generator (number of products, and which elements) with exporter,
and include the new bitlength-based generator in explore3j or basilisk 291.

5. write a UI front-end for this, with a selectable list of elements to include
(or hit 'randomize'), as well as saving/loading templates of these.

6. optionally show values (cut off to some number of characters, along with element name and magnitude, and/or a column with family type).

7. allow generating data without exporting.

8. optionally include importing

9. add an option to convert to log values for element values (or pre-convert them).

10. an optional column specifying whether each known and/or unknown is positive/negative, and whether it is inverted (abs(n) < 1) or not.

11. bootstrap the old AutoML code and test it with some dry runs small scale data (1k, 5k, 10k, 50k products and their variables).

12. write a UI for this, showing iteration, batch size, progress, loss, etc.

13. generate training and validation data for the AutoML script

14. train a model for boundary threshold generation with big RSA keys.

15. find the old code, or rewrite the function to convert from the hexadecimal representation of RSA keys to integers

16. do the same, but write it to convert from integers to hexadecimal representation.

17. once the dictionary-family system reorganization is in place, assuming we know what variables are inverted,
write an algorithm to establish an ordering or rank system, expressing bounds in element names
(possibly use random word combinations to name these automatically).

18. once the inversion detector works, and the boundary-set can be generated (the rankings) with confidence
(because we know what unknowns are inverted), generate candidates in the modular set module, and then
filter them with the boundary set. Winnow candidates during tests to see how small we can get the set.

19. explore any other convergent algorithms in the research docs before moving on, because there may be an easier way
than the convergent set (CS) algorithm I'm currently using now that I have access to a precise and consistent approximation mechanism of the private keys.

20. If not, proceed to clean up the CS into a single, clean, straightforward piece of code.

21. Rewrite manual convergence steps into automatic convergence output values for generating test data.

22. Train a model on lots of convergence step numbers and the variables (both known and unknown) of randomly generated RSA
keys

23. Likewise train a model using products, their knowns, and their best candidate subset (after all filtering)
to aproximate knowns

24. figure out how to get model performance out as either a variable or a text/csv export

23. Write a system that automatically 1. tracks combinations of parameters (variables) 2. their performance
allow saving/loading these combos

24. winnow and parameter sweep till we find the best combination of variables for predicting the convergence step values.

25. put it all together into one coherent interface.

26. open source and pop some popcorn from the radiant heat of modern globohomo civilization's collapse.

27. nothing of value was lost.

10 comments block

sort by: hot new top old low

[ - ] RevengeOfNeri 3 points 2 monthsMar 26, 2025 04:41:38 ago (+3/-0)

Someone needs to cut down on their caffeine intake 🤣

[ + ] prototype

[ - ] prototype [op] 0 points 2 monthsMar 26, 2025 10:20:41 ago (+0/-0)

I rearranged the four key parameters into the word kiju, because godzilla hates globalism lololol.

[ + ] RevengeOfNeri

[ - ] RevengeOfNeri 1 point 2 monthsMar 26, 2025 22:17:32 ago (+1/-0)

Maybe increase the lithium intake at same time.

[ + ] prototype

[ - ] prototype [op] 0 points 2 monthsMar 27, 2025 02:18:46 ago (+0/-0)

No lithium intake but I drink a ton of lead-contaminated water.

It's fucking delicious.

Also makes me immune to radiation or getting old. Sucks to be you!

[ - ] deleted 1 point 2 monthsMar 26, 2025 06:30:30 ago (+1/-0)

deleted

[ + ] prototype

[ - ] prototype [op] 0 points 2 monthsMar 26, 2025 10:31:53 ago (+0/-0)

Nice tweaks

Getting there. I noticed I neglected to mention that averages look wrong because they're actually running averages

Testing numbers underrepresented the total number of keys tested too, because the ones listed are just for the latest tests.

I could do bigger key tests, both quantities of keys tested, and bit lengths of keys, except I don't have the compute for it and I'd be waiting days because the testing, which needs to be robust, actually takes longer than the generation and decryption code.

[ + ] Deleted

[ - ] deleted 1 point 2 monthsMar 27, 2025 07:20:20 ago (+1/-0)

deleted

[ + ] prototype

[ - ] prototype [op] 0 points 2 monthsMar 27, 2025 14:41:38 ago (+0/-0)

I used the running averages at the 2048 bit length to adjust the vector parameters to what they are now.
Testings done.

On to all the other shit that still needs done.

[ - ] Crackinjokes 1 point 2 monthsMar 26, 2025 07:50:35 ago (+1/-0)

I understand about a fourth of that but it sounds like you're making real progress

[ + ] prototype

[ - ] prototype [op] 0 points 2 monthsMar 26, 2025 10:24:10 ago (+0/-0)*

That's okay, most days there's a 50/50% chance I won't understand it on any given day I look at the code.

The 27 step outlined plan is about as close as I've ever got, knowing that the other components are tested and working. I'd be less certain if I hadn't already trained and tested neural net models on smaller keys already, because learning that took time and a lot of confusion. I knew the theoretical implementation just not the math and the step-by-step practical implementation at the time.

I'm glad now I kinda gave up and tried the machine learning angle before coming across the modular algorithm.

If I don't understand the individual components of an architecture it feels like it's up in the air, throwing data at a dart board, because how do you know some component even solves the problem it is tasked to solve when you don't understand what it is doing?

Four months ago I wouldn't have been able to build the current integration process of the various sub algorithms, even if I did have the modular generator part back then, precisely because I only knew how to use prebuilt ML libraries, not enough about the process to fully implement or understand why the machines took the inputs or produced the outputs that they do.

Sign into an existing account

We're in the homestretch boys. Cryptography goes VOOM. submitted by prototype to whatever 2 monthsMar 26, 2025 02:53:30 ago (+6/-0) (whatever)

We're in the homestretch boys. Cryptography goes VOOM.
submitted by prototype to whatever 2 monthsMar 26, 2025 02:53:30 ago (+6/-0) (whatever)