This the question I’m trying to answer:
How can a computer scientist do research without using and producing only free and open source software?
This question is the corollary that follows from this hypothesis:
Free and open source software (FOSS) is the only way to produce and use software that follows the scientific method.
There are all sorts of reasons a human programmer might want to keep their computer code a secret. But if that human is also a scientist, isn’t it their duty to produce science that can be verified?
Looking at the scientific method definition:
To be termed scientific, a method of inquiry must be based on gathering observable, empirical and measurable evidence subject to specific principles of reasoning.[2] A scientific method consists of the collection of data through observation and experimentation, and the formulation and testing of hypotheses.
Often to verify science requires access to the same equipment – particle collider, electron microscope, or space-based telescope. At some level, especially where the machinery is unique, the best method is to provide access to the machine for other scientists to verify. But where are you if your science can only be produced and reproduced using a black box that you have to buy, and that black box clearly has an influence on the outcome of the science? How can you verify the science if you cannot study the insides of the black box to understand how it affects the outcome of the experiment?
How can I verify your science if your code isn’t open? I have to be able to fully observe your experiment to verify your hypothesis. How can I measure your evidence if the tools of measurement, your software code, are compiled and only available in binary form?
You could say that having non-commercial access to the source code would work for verifying the experiment. That presupposes that the verification doesn’t require a commercial interaction that would be forbidden. For example, if the code supports a new business method, I can’t verify that the code or method work without doing actual business with them. Ultimately, these field of use restrictions block unknown usages of the science such as humanitarian or medical where the ethical situation is higher but couldn’t be predicted in advance (when choosing original licensing/release terms.) These field of use restrictions also have a chilling effect on other scientists. It’s less clear what is or isn’t a violation, so they seek a scientific solution in code (software) that is less ambiguous.
In addition, restrictive terms for viewing source code has the effect of tainting the recipient. How is it scientifically ethical if you require a colleague to sign away an unknown number of enquiries in a liftetime of research for fear of violating a source code sharing contract?
There is a whole mess in here with patents, and this is related to why patents may be unethical for science. In a machine patent, the science isn’t necessarily being patented; it’s the results of the science that is. Any science that leads up to the machine patent should be open and visible for reproducing and verifying.
But a software patent is a slippery thing. The patent may cover the science as well as the product of the science, in that both can be in the code. There is an ethical dilemma for any scientist when they patent the science. They are putting a price tag and control on reproducing and verifying the science. Without verification, the science is invalid.
In case you are wondering if this is just semantics and word choices, it is. Perhaps all of the people who call themselves computer scientists, shouldn’t? I presume the word has meaning for them, as it does for the rest of us, and I expect them to act accordingly.
Being a scientist has a specific meaning that spans a long part of written history. How long? Several hundred to several thousand years, depending on what you are measuring. It is clear that the scientific method has been followed since at least the Middle Ages. It predates copyright and patent law by at least several centuries, if not nearly a full millenium.
It was Sir Isaac Newton, amongst others, who said,
If I have seen a little further it is by standing on the shoulders of Giants.
That, friends, is the whole point of FOSS. It is, so far, the best way we’ve found as computer scientists (schooled/amateur/citizen) to live up to Newton’s ethics and methods.
I dubbed my point above as a hypothesis because I am opening this idea for debate by scientists, particularly computer scientists and scientific ethicists.
As mostly an aside, there’s a long-running debate inside computer science about whether it actually *is* a science or not. You can argue that we aren’t understanding natural phenomena when we create abstractions and then reason inside them — we’re doing something much closer to mathematics than physics, and we don’t say that mathematics is a science.
Of course, this doesn’t change my belief that code produced inside academia should be free and open. But I think it does suggest that you need a more subtle argument than “Hey, computer scientists, you claim you’re doing science, right?”, because the answer for many will be “.. no?”.
It’s perhaps also worth noting that this problem is not at all limited to CS. Many biology professors will take out patents on methods of diagnosing specific diseases involving (for example) genetic tests against a given gene. All of your arguments would apply equally well to them, I think.
I’ll widen that out. How can any research scientist do research at this point without using openly developed, peer reviewed, published software tools.
It’s amazing how much of current day scientific output relies heavily on the processing of digital data. Even if its simple data regression tools inside a spreadsheet, software tools underpin a very large amount of the published scientific literature that is being produced. How much of that software has been peer reviewed?
Right now, there is zero incentive for research scientists to make use of open tools. There’s no requirement that I publish the software I use to generated a specific scientific result as part of the peer review process when I write an article. Similarly there is no such requirement for physicists, biologists, chemists.. on and on. Software tools are simply artefacts that are used and thrown away in the course of generating results.
Few if any researchers in a particularly specialized field are qualified to review the software in use when reviewing scientific output in the form of published articles. Until its a requirement to publish the digital tools used to manipulate raw data and to fashion scientific understanding, there will be no incentive for researchers to use open tools in the face of pressure from the institutions to protect and cultivate IP that the institution can then relicense to others. Case in point the history of the NAG libraries.
I agree with both comments, my goal in getting the first hypothesis was to use it as a stepping stone to a wider hypothesis:
Is it necessary for me to prove the first in order to move to the second? I see from both of you that maybe we should just make that jump in the argument, but I see some value in having a CS-specific version.
(A) Computer Science doesn’t really exist. (B) Software Engineering isn’t Engineering, given there really are no standards, nor is it applied science.
Basically, software development is a mixture of general contracting (i.e. construction) and architecture — at best.
Which is why, about the software itself, I don’t get terribly excited about it.
Strategy of what you create is more interesting than how you create it.
Anyway, if you theory were true, you should probably hinge about whether something is theoretically more testable or better QA’d, which is obviously *not* true about FOSS software compared to most commercial releases. Nor have they tested the market better, or else, say, on the desktop, it would have won out.
Right, I’m saying that I think the second argument is obvious and strong, and the jump is actually when you go from the second to the first, rather than vice-versa. 🙂
It might not turn out that the standard of adequate openness for publication and research is the same as a user-freedom-driven definition of “free software” or “open source”. For example, a “just like a book” or “no derivative works” license might be adequate for publishing a new CS result but not enough to qualify as Free.
That is probably the case, yes, and is it because those standards are out of date?
For example, if you publish a biology paper under normal copyright, that doesn’t stop me from building on the ideas in that paper. (Presuming you haven’t patented *spit* them.) So, normal copyright and scientific publishing standards apply.
The same isn’t true for a CS paper. In many cases, the code and the ideas are inextricably linked. Now standard copyright isn’t good enough, I can’t rely upon that to build on your science.