Friday, April 06, 2018

The Death of the Scientific Paper

I’m sitting in my office at Université Laval, waiting for an opportunity to speak with my professor, and procrastinating revising a manuscript. My procrastination, almost always, is to read the internet, and today I’ve found a new article from The Atlantic, “The Scientific Paper is Obsolete”. 

The main thesis of this article is that the scientific paper as we know it today has outlived its utility. The author, James Somers, opens with a description of the niche the scientific paper was invented to fill: a short, incremental advance published as widely as a book but as readable as a letter, and permanent where a lecture is ephemeral. I’ve had conversations with academics in social sciences or humanities disciplines who express their surprise that books, which for argument’s sake are publications longer than about 100 pages, almost never appear in the list of citations in my scientific publications. I list 11 publications – scientific papers – on my C.V. with me as an author (always one of several, I have no sole-author publications) and I’m first author on 7 of those; this means I did most of the actual writing. I feel this experience gives me some perspective to evaluate the article in The Atlantic.

There are the expected jabs at the style and perceived readability of scientific papers, a criticism so widespread and consistent that I now mostly ignore it. I get it, you don’t get the enjoyment of reading a scientific paper that you get out of reading something else, and you put the blame largely on the abundant jargon and dense prose of typical scientific papers; James Somers also adds some mentions of “mathematical symbols”, which is indeed one major feature of many scientific papers that separates them from written works intended for a wider, non-specialist audience. But that’s the point – the intended audience of a scientific paper is not the general public, it’s other experts in that discipline. Know your audience. I guess James Somers does - scientists and non-scientists decrying the difficult prose of scientific papers to non-scientists is very popular in popular science articles.

This isn’t to say that a scientific paper cannot be or should not be highly readable to non-specialists and other members of  the general public, but to approach a scientific paper as a non-specialist and then complain about the jargon is to miss the point. I think one has to approach a scientific paper from a position of self-knowledge, in that I have to read a paper outside my area of expertise in a different (and more difficult) way compared to reading a paper that might cite my own work.

Another major difference between a scientific paper and something like an article in The Atlantic – and these two categories are of similar word-count, on average – is the abundant citations in a scientific paper. Every fact, every suggestion, every piece of information in a scientific paper that is not derived directly from the study itself will be cited; credit is given to the prior work that established those facts or provided those suggestions (unless the fact or suggestion is obvious or already widely known and established; we don’t cite Scheele and Priestly (1772) when talking about oxygen, for example). I find myself wishing for some citations and outside attributions while reading this Atlantic article because James Somers makes so many claims that I would like to dispute.

For example, here’s the third paragraph of the article:

The more sophisticated science becomes, the harder it is to communicate results. Papers today are longer than ever and full of jargon and symbols. They depend on chains of computer programs that generate data, and clean up data, and plot data, and run statistical models on data. These programs tend to be both so sloppily written and so central to the results that it’s contributed to a replication crisis, or put another way, a failure of the paper to perform its most basic task: to report what you’ve actually discovered, clearly enough that someone else can discover it for themselves.

 Are papers really longer in 2018 than they were, on average, in 1998, or 1978, or 1888? Are they more “full of jargon and symbols”? Are the majority of analytical computer programs “so sloppily written”?
And what replication crisis? Mr. Somers, have you not read the recent counterargument to the crisis-in-science narrative by Dr. Fanelli, recently published by PNAS?  

Moving on, one major criticism is that scientific papers are not a good way to express and describe complex results. Animations, something computers are quite good at, are useful tools for visualizing such complex concepts but are very difficult to express on a static sheet of paper, which the modern PDF (Portable Document Format) emulates. I agree, but I do not agree with the follow-up point that this renders the PDF hopelessly useless. A scientific paper is about the words, not the pictures or other visualizations. It’s about the information. Expressing that information in a way the audience can understand and use is the key skill of writing a scientific paper, and is distinct from the skills that create written material intended to be read by as wide an audience as possible. A scientific paper relies heavily on absolute honesty, and presenting all of the available and relevant information to allow the reader to independently decide to agree or not with the author’s arguments and conclusions. A magazine article pushes a particular interpretation of some phenomenon. A scientific paper pushes the phenomenon and then describes one (or sometimes more) possible interpretation of that phenomenon, usually in light of similar phenomena and potential alternative interpretations. A graph is not data, it's an expression of data. An animation is not an argument, it's one support for an argument.

Visualization is a technique, a way to take obscure numbers and show the patterns they contain. I struggle with it, constantly. The paper I am procrastinating working on right now has some decent figures* in it and I don’t see a need for a great deal of work on the visualization side of this paper. I have another project I’m working on that is at a much earlier stage and my current activities there are primarily concerned with visualization. I’m at the “data exploration” stage, where I throw the metaphorical spaghetti of the data at the metaphorical wall and see what sticks. That means lots and lots of images, mostly graphs I get my computer to make for me, and some scribbles on paper in my notebook.

*A figure is any image in a scientific paper, a photograph or map or, most commonly, a graph illustrating the mathematical relationship between two or more parameters. I tend to write papers by making the figures first, but that's a personal style and subjective workflow thing, and certainly not universal among scientists.

Back to The Atlantic

It’ll be some time before computational notebooks replace PDFs in scientific journals, because that would mean changing the incentive structure of science itself. Until journals require scientists to submit notebooks, and until sharing your work and your data becomes the way to earn prestige, or funding, people will likely just keep doing what they’re doing.

This is more interesting to me than the preceding description of competing formats for “computational notebooks”. I have seen suggestions from other people that concentrate on changing other aspects of scientific publishing, often the abolition of for-profit publishing companies (e.g. Here), but these suggestions and discussions do not express a dissatisfaction with the basic unit of scientific communication, the scientific paper. What would my job look like if both scientific papers and the way in which they are disseminated were to go away? Would I just be uploading lumps of code and datatables to some institutional server, whenever I feel like my analyses have answered some tiny question? Does my "Literature Cited" section just become a link-dump?

“At this point, nobody in their sane mind challenges the fact that the praxis of scientific research is under major upheaval,” Pérez, the creator of Jupyter [one of the competing calculation notebooks – MB], wrote in a blog post in 2013. As science becomes more about computation, the skills required to be a good scientist become increasingly attractive in industry. Universities lose their best people to start-ups, to Google and Microsoft. “I have seen many talented colleagues leave academia in frustration over the last decade,” he wrote, “and I can’t think of a single one who wasn’t happier years later.”

I had to look up the definition of “praxis”; I think it’s exactly what I was talking about, what does my job look like if the scientific paper and scientific publishing are drastically changed? Dr. Pérez apparently thinks my job would not change much. I’m not so sure.

There’s also a problem in that paragraph with a possible logical fallacy: confirmation bias. Lots of sad people leave, and then you find a few of them later and they’re happier. Well, good! Happier people is a good thing. But to then claim that it was the act of leaving that made them happier, and then extend that by implication that everybody should consider leaving, is to stretch beyond the available information into unsupported (and idealistic) speculation. If the only people who left were the unhappy people, then what about the happy people who stayed? Would they have also become even more happy had they left? Did the people who stayed unhappy, or became more unhappy after leaving avoid talking to you?

At this point I’m wandering away from the discussion about scientific papers. And I think the article did, too. It concludes with a weak suggestion that maybe some new tools will be useful (who could disagree with that? Tools are useful by definition) and that, hey Galileo, right?

I remain unconvinced in the impending death of the scientific paper. What I got out of this article was a description of some computer programmers and physicists with generally poor social skills but good ideas and skills related to generating and analyzing data. And that somehow this means the time I spend teaching ESL graduate students how to write better English that is also in the demanding, highly technical style of current scientific communication is somehow wasted.

No comments: