×
Login Register an account
Top Submissions Explore Upgoat Search Random Subverse Random Post Colorize! Site Rules Donate
1

I've posted an open question about RAG and image generation. Anyone who wants to can chime in. Image RAG?

submitted by deleted to AI 4 monthsFeb 18, 2025 22:22:20 ago (+1/-0)     (AI)

deleted


2 comments block


[ - ] MaryXmas 0 points 3 monthsFeb 19, 2025 18:49:22 ago (+0/-0)

Rag is very weak on relationships, meaning they are basically non existent. This leads to rag maps of sorts but still in a prompt sense. For images, you basically want to train on images not use rag functionality. You would want a slm (small language model) focused around your training data. But I think you will find the setup might be fairly difficult which is why it is probably not worth it.

You could use the select edit feature on the dalle gpt and just ask it to edit the selection, that might work.

[ - ] Cantaloupe 0 points 4 monthsFeb 18, 2025 23:12:11 ago (+0/-0)

So image text chunks for augmentation?

Essentially:

Images to clip or blip
Queries into embeddings
Retrieve based on similarity

Image to text model
Blip2 gpt4v palmE

Descriptions into LLM for processing

Extract features OCR text, tesseract?

Stable diffusions