Provenance After the Internet

14 min read

Everyone is racing to answer one question about AI images: was this made by a machine? C2PA and Content Credentials, Google's SynthID, the marking clauses in the EU AI Act, California's SB 942 - all of it points the same direction, at labelling machine output so platforms can flag it and regulators can enforce it. Useful work. But I got stuck on the opposite question, and it's the one that matters.

I've never been quiet about how I feel watching this happen. Someone spends days on a piece, and it gets scraped, run through a model into a same-but-different version, reposted with the signature cropped off, and there's nothing they can do to show it started with them. That's the part that gets me. Not that the tools to generate exist - that the person who made the original has no way to stay attached to it. The work gets taken, the credit goes with it, and the provenance industry is racing in the other direction.

Three questions

Three different questions hide under the word "provenance."

The industry asks: was this made by a machine? The creator asks: is this mine? And there's a third one, the one a normal person feels, scrolling Pinterest and falling in love with a piece of art they will never trace - whose is this?

The first question is the one with all the funding. The other two are the same query from opposite ends. "Is this mine" is a creator pointing at a stranger's repost. "Whose is this" is a fan pointing at an image they want to credit. Both are answered by the same thing - a system that can take a picture and tell you what original it came from. The tech to do it exists. What doesn't is anyone building it for the artist who got scraped or the fan who wants to find them, because the buyers with budgets are the platforms, not the creator.

That dual purpose is why this is worth building. The protection half is a grudging purchase - people reach for it after they've already been burned. The discovery half is a magnet. "Find who made this" is something people want to do for fun, and it's what gets enough work into a registry that "trace this back to me" starts working at all. The fun half pays for the serious half.

The wrong question

Walk through what happens to a piece of art that gets taken. Someone saves it off your post. They crop your handle out. They run it through an upscaler, or an img2img model with the strength turned up, so what lands on the reposter's account is recognisably your composition with none of your pixels. Then it's uploaded somewhere that recompresses it, strips the metadata, and resizes it to a platform cap. Three hops later it's a JPEG a thousand pixels wide that doesn't contain a single byte you put there.

Now ask the AI provenance stack to help. C2PA binds a file to a signed history, and that history holds as long as the manifest rides along with the file or a compatible tool preserves and re-signs the chain through each edit. Ordinary distribution does neither - a repost, a screenshot, a platform transcode that strips the metadata, and the hard binding is gone and the chain is broken. SynthID-class watermarks can tell you, with decent confidence, that a supported generator's signal is sitting in the pixels. Neither answers the creator's question, because the creator's question isn't "is this synthetic" and it isn't "is this byte-identical to a signed original." It's "this is a mangled derivative of something I made, and I need to show it's mine." Different problem. The tools built for the first one do not touch it.

The industry points the wrong way out of incentive, not stupidity. The buyers with budgets are platforms that need to label AI content for compliance, and regulators who need that labelling to exist. The artist whose work got scraped is not a procurement department. So the money builds detection and labelling, and the creator-side problem gets left as an exercise.

Watermarks

The obvious answer, the one everyone reaches for first, is the invisible watermark. Hide a signal in the pixels a human can't see but a detector can read, and the work carries its origin wherever it goes. Good idea. I've spent over a year building one - a classical, no-AI engine, shipped publicly once and hardened since - so this isn't a stranger throwing stones, it's me about my own work. And on its own it's a far weaker guarantee than the people selling it admit, mine included. I didn't want to take anyone's word for how weak, so I built the rig - embed a mark, attack the image the way the internet attacks it, try to read the mark back - and ran my own engine through it on a few dozen real photographs. The numbers below are mine, not a citation.

A well-built invisible watermark survives the benign things: re-encoding to JPEG or WebP, mild blur, brightness and colour shifts, a ninety-degree rotation. I watched it come back clean through compression down to low quality. For someone re-saving and reposting your work, a watermark earns its keep.

Then you hit the walls. There are three, in rising order of how few people talk about them.

The first wall is geometry, and most people get it wrong: the watermark doesn't get removed. A detector reads the mark by lining itself up with the signal, sample for sample. Rotate the image a few degrees, resize it to a resolution the embedder never saw, crop a strip off one side, and the signal is still sitting there in the pixels - the detector has just lost its place, and the correlation collapses to noise. Present and unreadable at the same time. When I cropped or downscaled the marked images, decode dropped to zero. Not degraded. Zero - while the same engine held perfectly against compression and blur. And a downscale is not an exotic attack. It's the single most common thing every platform does to every image it receives. Recovering from it is twenty-five years of solved signal-processing theory - Fourier-Mellin, log-polar resynchronisation - but it's real engineering with a real runtime cost, and plenty of shipped watermarking falls over on a plain resize.

The second wall is the one that can't be engineered away. Run the image through a diffusion model - the same regeneration that turns your composition into a same-but-different repost - and the watermark doesn't degrade, it dies. This isn't an implementation problem. There's a 2024 result, Invisible Image Watermarks Are Provably Removable Using Generative AI, that proves it for the whole class of low-perturbation marks: add a bounded amount of noise, regenerate, and no detector can reliably recover the signal, by construction. UnMarker, at IEEE S&P 2025, went further - one universal attack that strips every watermark it was tested against, including the semantic ones meant to survive, with no knowledge of the scheme and no detector access, while keeping the image clean. The authors' own conclusion is that defensive watermarking is not a viable defence against a determined adversary with a model.

The third wall is the one almost nobody outside the security literature has heard of, and it's the most damning. Robustness is not security. A watermark surviving JPEG tells you it's robust - it withstands blind, accidental processing. It tells you nothing about whether it withstands an adversary who is trying. Classical spread-spectrum watermarks - the kind I built, and plenty of shipped ones - embed the same secret carrier in every image you mark. Which means the secret leaks. Collect enough of your watermarked outputs, run the right decomposition over them (PCA to find the subspace, ICA to pull the carriers out), and the key falls out of the statistics. Now the attacker doesn't just remove your mark, they forge it - stamp your watermark onto images you never made. The watermark-security work that proves this, Cayre and Furon and that line, is fifteen years old and almost completely absent from the current conversation about protecting artists. A watermark can be robust and insecure at the same time, and most people shipping them have never drawn the distinction. The neural schemes on the frontier don't use a fixed carrier, so they dodge this particular leak - and fall to the regeneration and adversarial attacks above instead. No paradigm gets to survive an accident and call itself safe against an adversary.

So the honest position: against laziness and ordinary handling, a good watermark works. Against geometry, it needs real engineering most products skip. Against regeneration, it loses by theorem. Against an adversary with a pile of your marked images, it leaks its own key. It's a speed bump and a provenance hint. It was never a lock, and anyone telling a creator it is one is selling something.

What actually works

If the watermark is the part that dies, the question is what carries the weight when it does. Stop trying to make a signal survive in the pixels. Match the content itself.

Take every image you want to protect and compute a perceptual fingerprint - not a hash of the bytes, which changes the moment a pixel moves, but a learned descriptor of what the image depicts. Store them in a registry. When a suspect image turns up, fingerprint it the same way and search. A model built for this - the copy-detection descriptors that came out of content moderation - will match a heavily cropped, recoloured, re-encoded, even partially regenerated derivative back to the original whenever enough of the visual structure survives, because it compares what the picture is, not what bytes it's made of. Push the edit far enough and the match softens into plain visual similarity, which is a lead and not a proof - but across the range ordinary reposting produces, it holds. It needs nothing to have survived in the pixels. It's the mechanism behind every "we found your song in someone else's upload" system that already works at scale.

This is the layer that answers both of the questions that matter. The creator's "is this mine" and the fan's "whose is this" are the same registry lookup pointed in opposite directions. The watermark, when it survives, upgrades a match from "this looks like your work" to "this carries your specific mark." But the registry is what catches the derivative the watermark couldn't, and it's the half that does the job people care about - on both sides.

A registry isn't magic either, and its failure modes are governance, not signal processing. If anyone can register anything, it's a land grab - whoever uploads your work first owns it in the index. So the registration has to carry its own provenance: a timestamp that predates the copy in dispute, an identity the registrant controls, public and private discovery kept separate so not every image resolves to someone's real name, and a path for the collisions that aren't theft - the shared pose, the meme template, the fan edit - where the visual match is real and the authorship isn't. The lookup is evidence, not a verdict; a registry that treats a nearest-neighbour distance as proof of copying is overclaiming the same way a watermark vendor does, only pointed at the creator instead of for them.

One detail shows up only if you've read both literatures, and it's the sharpest tell of where each approach stands. The copy-detection world's flagship benchmark, DISC2021, includes photographing a screen and re-uploading it as a standard test the descriptors are expected to beat. The watermarking world's flagship benchmark, WAVES, excludes physical recapture entirely. The two research communities draw their robustness lines in different places - and they draw them exactly where their method works and stops working. Retrieval is built to survive the photograph-of-a-screen; watermarking declines to be tested on it. If you want to know which tool to trust for an image that's been through the real world, that single asymmetry tells you more than any benchmark score.

The standards bodies already know all of this, even if their marketing doesn't. C2PA's own answer to the metadata-gets-stripped problem is a feature called Durable Content Credentials, and its design is the whole admission: a hard binding, the cryptographic hash, plus a soft binding - a watermark or a fingerprint - whose entire job is to re-discover the credential after the hard binding is gone. The people who wrote the standard understood the signed manifest doesn't survive the internet alone, and that you need a content-bound layer underneath. The field, when it's honest, converges on the same shape: no single mechanism solves this, so you layer them, and you stay honest about which layer caught what.

That last clause isn't a throwaway. A system that prints a confident "authentic" or "fake" is the one that gets destroyed the first time the two layers disagree - and there's published work on manufacturing exactly that disagreement, a cryptographically valid signature that contradicts its own watermark. Provenance systems can be made to lie about themselves. The system that survives a sceptic is the one that reports what it actually found: the mark decoded, or it was present but unreadable, or it was gone and the fingerprint matched at this confidence, registered at this timestamp, which predates the copy you're worried about. A graded answer that never claims more than it can defend. In a field this full of overclaiming, not overclaiming is the whole game.

The floor

There's one thing none of this solves, and the honest version of the argument has to include its own limit. If someone sits down and redraws your work from scratch - same pose, same idea, their hand, their pixels - no watermark and no fingerprint proves that's yours, because at that point it might genuinely not be. That's a question about authorship and inspiration and law, not one a model answers, and any provenance system claiming to settle it is lying. The technical layers prove "this is a copy of a specific registered work." They do not prove "this idea was mine first." Knowing where that line sits, and not pretending it's somewhere else, is the difference between a tool and a pitch deck.

The map

So the honest map, the one the people selling provenance rarely draw because half of it is bad news for them:

Benign edits, the everyday repost - the watermark handles it. Geometry, the crop and the universal downscale - solvable, real resynchronisation engineering, and the place most shipped watermarks quietly fail. Regeneration and deliberate removal - the watermark loses by theorem, and the fingerprint registry carries it. An adversary with a corpus of your marked work - the watermark leaks its key, and only a properly keyed, observation-limited design slows them down. The from-scratch human redraw - nobody's system solves that, and the honest move is to say so. Every case has an answer or an honest admission that it doesn't, and the thing tying them together is a layered system with a graded verdict, not a magic mark.

I measured the parts everyone takes on faith and read the security literature the current conversation skips, and it held from every direction: the industry is racing to answer the wrong question, and the right one - is this mine? - has a real answer nobody has put in the creator's hands. Not a magic mark. A layered system, honest about which layer caught what: a watermark for the easy edits, a content-fingerprint registry for the hard ones and for finding who made it, a timestamp that holds up, a verdict that never lies about its own confidence.

That's what provenance after the internet comes down to. Not surviving untouched, because nothing does - staying traceable to the person who made it, after the internet has done its worst to the file.