>*Peter's group can analyse thousands or hundreds of thousands of papers an hour...

Blahah · on Feb 19, 2014

I agree with all your analysis. In this case the presentation is discarded and, as I see it, only the facts are retained.

For example in parsing chemical structure diagrams what is recorded are things like the number of atoms in a section of a molecule, their angles and what kind of bonds exist to neighbouring molecules. These data are then analysed to generate the formula and re-construct a correct diagram.

There are no licenses to my knowledge on older papers that allow publishing the analysis (but within the University we have agreements with JSTOR, for example). Looking at public domain stuff is OK. This is part of the issue - that knowledge should be a public good.

If you're a lawyer I'm sure Peter would appreciate hearing your opinion.

jdmichal · on Feb 19, 2014

I would think that the conversion from a rasterized scan to an "open markup language" would be sufficient to count as a new presentation, no?

pbhjpbhj · on Feb 19, 2014

>conversion from a rasterized scan to an "open markup language" //

[Skip to the end!]

[strikeout]

The problem is that a rasterised scan is a new - potentially unauthorised - copy. UK law tends to be more restrictive as we don't have the same sense of "Fair Use" as 17 USC.

Art 5.1 of the EU Copyright Directive (2001/29/EC; Section 28A of the UK CDP Act) at Section 1(b) allows for transient copies to be made when the copying is part of an otherwise allowed act. But the stipulation is that the copy can't have "economic significance".

Here the rasterisation then would appear to fail, even if it can be considered a transient part of the transfer of the information from copyright diagram to free-libre ML.

This to me - as a non-expert [though I consider myself pretty well read on copyright] - means that the conversion needs to be made without making an intermediate copy. A manual process would bypass the problem of making a copy for a computer program to analyse at the expense of lots of human input.

This is where things get silly as the end result is the same - the extraction of information from a catalogue of molecular diagrams - the process is just made more expensive. I rather hope my analysis is wrong actually and that a court would rule that scanning such works would be allowable in order to extract the informational content; would love to have more input here.

[/strikeout]

... actually further looking at S.28A(b) makes me think I am wrong; that this should be allowed. I'm convinced that the copies made aren't independently commercially significant and that as the process of extracting the information from the diagrams is an allowed use then the "transient copy" legislation makes this allowable.

IA[of course]NAL.

Blahah · on Feb 19, 2014

Crucially, here the rasterised scan is made by the publishers, and whoever runs the analysis software is allowed to access the digital image.

pbhjpbhj · on Feb 19, 2014

I'm not sure that is crucial - I'll bet there are terms associated with the allowance of access to the already rasterised images along the lines of "solely for individual reading".

However I'll leave it there as it's too complex an issue to address generalities rather than the specific nature of the inputs, processing and analysis and intended uses, commercial aspects and such.

I pray every blessing on your knowledge sharing endeavours.