what exactly does this mean? misrepresenting the altered document as unaltered?
i cant imagine it being illegal to do madlibs
> i cant imagine it being illegal to do madlibs
Of course not, but for one requirement: Zapf Dingbats!
This seems unlikely to be illegal unless you're representing them improperly.
That's the point though. You cannot just write anything and put it up.
It must be accurate. Even that being said, you still shouldn't reupload your altered document anywhere.
I guess you mean offical legal documents or something, but your sentence doesn't say that or mention those so it comes across in a very confusing way (it implies that using Word is illegal because every time you type something you alter your document)
Thank you! The OP is being very ambiguous and cavalier with language.
Why not? In some cases it might amount to fraud or something, but in general, why would it be prohibited?
this tool coming out on the heels of the DOJ releasing a trove of redacted documents doesn't come across as coincidental to me. let's think about this for a bit longer from that idea of using this on legal evidence...why would doctoring a legal document be prohibited?
Generally there is nothing illegal about altering a legal document, or even a strict definition of what counts as a legal document. Under some circumstances it could be illegal to alter a document and use that for fraud, or submit an altered document to a court or government agency. If the doctoring falsely defames someone then you could also open yourself up to a civil suit.
You do you but I advise you don't.
Standard CYA procedure
For all we know, Epstein could have punished Trump and made him write "I'm a little bitch boy" 2,000 times and it took up 119 pages so every line got redacted. /madlibs
You should really put some usage instructions on the README.
uv run --with PyMuPDF --with pillow ./unredactor-main/unredact.py
I tried a couple PDFs but get "Failed to open PDF: bad argument type for built-in operation".
Redactle.net has something similar where you can double-click or tap-hold then type a note over the redacted word.
The point is you can perform a box dimension attack.
If you have a known input, you can match all outputs.
Example: Document that DOJ took down and reuploaded that redacted Trump's name when it was previously available. They used the same size boxes in each location.
You cannot do this with handwriting, but fonts have known widths.
Couldn’t it be the same letters in a different order?
A probabilistic attack on redaction is still an attack.
You'd never be blase about the same information about your password.
Plus with redaction there's a pretty small number of posible words when the boxes are small.
depending on the font used, the spacing between letters can change depending on what letters are next to each other.
why unredact, rather than just edit the pdf to remove the redaction box and insert whatever you want? presumably you'd want a viewer to see that you modified a redaction, but why?
From a previous post of the author, I guess the motivation is to write back the text on top on the black boxes.
anyone using PDF features to redact are just not doing it right
With regards to the Epstein files, it seems some files are not redacted well.
Some others I've seen include 1-3 more letters than are in the redaction.
Are there tools for trying to predict possible fits for redacted data given font, black bar size, and context?
In some redacted documents, there is even an alphabetical word index at the end with a list of pages on which the words appear.
The redacted words are also redacted in the word index, but the alphabetically preceding and succeeding words are visible, as is the number of index lines taken up by the redacted word's entry, which correlates with the number of appearances of that word.
This seems like rather useful information to constrain a search by such a tool.
Yes.
I was thinking something similar. I wonder if the font uses kerning, and you know the rendering engine and the algorithm for how the text was blocked, if you can get exact text back even. Or, at a minimum, rule out words based on the available information. Not a field I am familiar with but I bet there are a lot of ways to uncover the redacted values.
I don't know what fonts are typically used in redacted documents, but surely this kind of technique could be rendered useless by a mono space font?
Seems silly not to use a mono space font in these cases.
Wouldn’t a mono space font provide more information since you can extrapolate the exact number of characters?
My guess is that is actually less information than you get from a variable width font.
Does it even matter? The kind of people who see stuff like this and are still fine with it are likely fine with anything else thats discovered as well.
> Republishing altered documents is illegal
what exactly does this mean? misrepresenting the altered document as unaltered?
i cant imagine it being illegal to do madlibs
> i cant imagine it being illegal to do madlibs
Of course not, but for one requirement: Zapf Dingbats!
This seems unlikely to be illegal unless you're representing them improperly.
That's the point though. You cannot just write anything and put it up.
It must be accurate. Even that being said, you still shouldn't reupload your altered document anywhere.
I guess you mean offical legal documents or something, but your sentence doesn't say that or mention those so it comes across in a very confusing way (it implies that using Word is illegal because every time you type something you alter your document)
Thank you! The OP is being very ambiguous and cavalier with language.
Why not? In some cases it might amount to fraud or something, but in general, why would it be prohibited?
this tool coming out on the heels of the DOJ releasing a trove of redacted documents doesn't come across as coincidental to me. let's think about this for a bit longer from that idea of using this on legal evidence...why would doctoring a legal document be prohibited?
Generally there is nothing illegal about altering a legal document, or even a strict definition of what counts as a legal document. Under some circumstances it could be illegal to alter a document and use that for fraud, or submit an altered document to a court or government agency. If the doctoring falsely defames someone then you could also open yourself up to a civil suit.
You do you but I advise you don't.
Standard CYA procedure
For all we know, Epstein could have punished Trump and made him write "I'm a little bitch boy" 2,000 times and it took up 119 pages so every line got redacted. /madlibs
You should really put some usage instructions on the README.
I tried a couple PDFs but get "Failed to open PDF: bad argument type for built-in operation".Redactle.net has something similar where you can double-click or tap-hold then type a note over the redacted word.
Free Law Project also has this open source tool to detect bad redactions: https://github.com/freelawproject/x-ray
The point is you can perform a box dimension attack.
If you have a known input, you can match all outputs.
Example: Document that DOJ took down and reuploaded that redacted Trump's name when it was previously available. They used the same size boxes in each location.
You cannot do this with handwriting, but fonts have known widths.
Couldn’t it be the same letters in a different order?
A probabilistic attack on redaction is still an attack.
You'd never be blase about the same information about your password.
Plus with redaction there's a pretty small number of posible words when the boxes are small.
depending on the font used, the spacing between letters can change depending on what letters are next to each other.
why unredact, rather than just edit the pdf to remove the redaction box and insert whatever you want? presumably you'd want a viewer to see that you modified a redaction, but why?
From a previous post of the author, I guess the motivation is to write back the text on top on the black boxes.
anyone using PDF features to redact are just not doing it right
With regards to the Epstein files, it seems some files are not redacted well.
For instance, this file says Mona if you remove the top layer https://www.justice.gov/epstein/files/DataSet%208/EFTA000136...
Some others I've seen include 1-3 more letters than are in the redaction.
Are there tools for trying to predict possible fits for redacted data given font, black bar size, and context?
In some redacted documents, there is even an alphabetical word index at the end with a list of pages on which the words appear.
The redacted words are also redacted in the word index, but the alphabetically preceding and succeeding words are visible, as is the number of index lines taken up by the redacted word's entry, which correlates with the number of appearances of that word.
This seems like rather useful information to constrain a search by such a tool.
Yes.
I was thinking something similar. I wonder if the font uses kerning, and you know the rendering engine and the algorithm for how the text was blocked, if you can get exact text back even. Or, at a minimum, rule out words based on the available information. Not a field I am familiar with but I bet there are a lot of ways to uncover the redacted values.
I don't know what fonts are typically used in redacted documents, but surely this kind of technique could be rendered useless by a mono space font?
Seems silly not to use a mono space font in these cases.
Wouldn’t a mono space font provide more information since you can extrapolate the exact number of characters?
My guess is that is actually less information than you get from a variable width font.
Either way, fixed or with index lines.
This just attempts to match box dimensions.
https://libraryofbabel.info/
Does it even matter? The kind of people who see stuff like this and are still fine with it are likely fine with anything else thats discovered as well.
The truth has become irrelevant.
https://www.justice.gov/epstein/files/DataSet%208/EFTA000250...
i'm sure people will ask chatGPT to do this very thing, so it's a good thing LLMs never make shit up
> lets you put your own information over a redaction box.
This doesn't remove redactions, it lets you write over them.