>Peter's group can analyse thousands or hundreds of thousands of papers an hour, automatically detecting errors and fraud and simultaneously making the data, which are facts and therefore not copyrightable, free. //
Facts aren't copyrightable but their presentation is. If they're copying the presentation of those facts in to a computer for commercial purposes (free release of the content of those papers is certainly commercial) then in the UK they'd be committing copyright infringement. There is also a possibility of infringing on EU database rights.
The OP link says:
>"Note that chemical structure diagrams are NOT creative works."
Whilst a plot or diagram may appear at first to lack "art" the presentation is formatted and certainly a "sweat of the brow" analysis supports it being copyrighted.
You can't scan someone else's molecular diagrams but you could look at them and enter the data yourself. Just like a chart, you can't duplicate the chart except by looking at it and copying the information - the presentation of which has been filtered away. Duplication in a computer produces an exact copy and AFAICT under UK legislation that would be infringing [silly as it is as the end result is the same it just takes more work].
Relying on only duplicating the image transitively in to non-persistent memory might satisfy the requirements not to make a copy. That would be an interesting test-case. How the law has been construed to allow the use of search engines would probably help here.
>take an ancient paper //
I'm curious what license you'd acquire the papers under that allows their duplication in this manner. Even ancient papers are often controlled by the library they're stored in or by those that performed their digitisation.
[FWIW I strongly disprove of this sort of the strength and duration of currently granted copyright]
I agree with all your analysis. In this case the presentation is discarded and, as I see it, only the facts are retained.
For example in parsing chemical structure diagrams what is recorded are things like the number of atoms in a section of a molecule, their angles and what kind of bonds exist to neighbouring molecules. These data are then analysed to generate the formula and re-construct a correct diagram.
There are no licenses to my knowledge on older papers that allow publishing the analysis (but within the University we have agreements with JSTOR, for example). Looking at public domain stuff is OK. This is part of the issue - that knowledge should be a public good.
If you're a lawyer I'm sure Peter would appreciate hearing your opinion.
>conversion from a rasterized scan to an "open markup language" //
[Skip to the end!]
[strikeout]
The problem is that a rasterised scan is a new - potentially unauthorised - copy. UK law tends to be more restrictive as we don't have the same sense of "Fair Use" as 17 USC.
Art 5.1 of the EU Copyright Directive (2001/29/EC; Section 28A of the UK CDP Act) at Section 1(b) allows for transient copies to be made when the copying is part of an otherwise allowed act. But the stipulation is that the copy can't have "economic significance".
Here the rasterisation then would appear to fail, even if it can be considered a transient part of the transfer of the information from copyright diagram to free-libre ML.
This to me - as a non-expert [though I consider myself pretty well read on copyright] - means that the conversion needs to be made without making an intermediate copy. A manual process would bypass the problem of making a copy for a computer program to analyse at the expense of lots of human input.
This is where things get silly as the end result is the same - the extraction of information from a catalogue of molecular diagrams - the process is just made more expensive. I rather hope my analysis is wrong actually and that a court would rule that scanning such works would be allowable in order to extract the informational content; would love to have more input here.
[/strikeout]
... actually further looking at S.28A(b) makes me think I am wrong; that this should be allowed. I'm convinced that the copies made aren't independently commercially significant and that as the process of extracting the information from the diagrams is an allowed use then the "transient copy" legislation makes this allowable.
I'm not sure that is crucial - I'll bet there are terms associated with the allowance of access to the already rasterised images along the lines of "solely for individual reading".
However I'll leave it there as it's too complex an issue to address generalities rather than the specific nature of the inputs, processing and analysis and intended uses, commercial aspects and such.
I pray every blessing on your knowledge sharing endeavours.
Facts aren't copyrightable but their presentation is. If they're copying the presentation of those facts in to a computer for commercial purposes (free release of the content of those papers is certainly commercial) then in the UK they'd be committing copyright infringement. There is also a possibility of infringing on EU database rights.
The OP link says:
>"Note that chemical structure diagrams are NOT creative works."
Whilst a plot or diagram may appear at first to lack "art" the presentation is formatted and certainly a "sweat of the brow" analysis supports it being copyrighted.
You can't scan someone else's molecular diagrams but you could look at them and enter the data yourself. Just like a chart, you can't duplicate the chart except by looking at it and copying the information - the presentation of which has been filtered away. Duplication in a computer produces an exact copy and AFAICT under UK legislation that would be infringing [silly as it is as the end result is the same it just takes more work].
Relying on only duplicating the image transitively in to non-persistent memory might satisfy the requirements not to make a copy. That would be an interesting test-case. How the law has been construed to allow the use of search engines would probably help here.
>take an ancient paper //
I'm curious what license you'd acquire the papers under that allows their duplication in this manner. Even ancient papers are often controlled by the library they're stored in or by those that performed their digitisation.
[FWIW I strongly disprove of this sort of the strength and duration of currently granted copyright]