Thursday, May 17, 2018

Les mots difficiles


This is a list of words, grammar, and writing conventions I put together during my time at Université Laval, based on my work there helping students and others with written English, especially scientific material such as thesis chapters, manuscripts for peer-reviewed journals, and grant applications. Most of these are examples from actual pieces of writing I was working on, with some that I did not see but similar situations arose. Most, perhaps all, are based entirely in my own personal understanding of English writing in Science, and I have tried to indicate the parts where subjective considerations such as personal style or the context of a particular phrase outweigh what I might consider to be correct.

This is a work in progress, though progress is stalled at the moment because I have no non-English-first-language colleagues I am currently helping. Suggestions, corrections, comments, and general discussion are very welcome!

Ameliorate – in English, this word means “to make less bad”, but in French, “improve” or “increase”. This makes the sentence “He ameliorated the suffering” mean completely different things.

Augment – this word is rare in English, and is almost always paired with some variation of “to be” in the subjunctive (yes, we have the subjunctive in English! It’s just not used very much). For example, rather than “the water level augmenting” (here, “augment” is a synonym for “rise”; it’s not really a full synonym but is used this way here), “the water level was augmented” (and the rest of the sentence would be expected to include a description of what caused this augmentation – e.g. “was augmented by recent rain.”).

Appendix / Annex – The additional section of a document after the References (or Literature Cited) is the Appendix; if there is more than one they are usually separated by numbers or sometimes letters: “Appendix 1, Appendix 2” or “Appendix A, Appendix B”. The word “Annex” means something different in English, though the meaning is usually clear.

Current / currently – The English translation of actuel is “current”, not “actual” (and actuellement becomes “currently”). When discussing things happening in real time, or when describing a situation in the present tense, use “current” or “currently”: “Current legislation includes restrictions on land-uses of this type”; “Domestic animals are currently not permitted on the site”. When describing how things are in reality that may be surprising or contradicting previous facts or statements, use “actual” or “actually”: “While this belief is widespread, actual conditions do include grazing by livestock”.

Global – it’s not wrong to use this word to describe conditions across an entire study area or project, but it is rare to use it that way in English. Most often, global refers only to the entire planet Earth. Or, you could be talking about global conditions on Mars, I suppose. “All of a big sphere” is a useful working definition – if your study or project can be thought of as a large ball, then “global” works well.

Humidity – this is a measure (relative or absolute) of the amount of water vapour dissolved in air or another gas. The similar French word humidité translates to moisture, damp (or dampness), or wetness. It is possible to measure soil humidity, the water vapour in the air spaces within the soil, but it is much more common to measure the amount of liquid water in soil, the soil moisture.

Plantation (and other words ending in –tion) – in English, words like this that end in –tion are almost always nouns, while in French they are verbs. In some cases, the –tion form does not exist in English, such as “planification” (corrected: “planning”), but in other cases a spell-check function will allow the word because it is a noun in English. “Plantation” is an example – it’s a noun, and means an area of land deliberately planted with a crop species (such as cotton or sugarcane, or commercially valuable trees) and the business associated with it. To describe the deliberate placement of trees or other plants, use “planting”, and use it as a verb. The noun is also acceptable, when used as a noun.
            “Plantation” has some associated baggage, in that the word is most often used in historical descriptions of slave-labour economic activity in the time before the abolition of slavery in countries or regions such as the United States (abolished 1865) where the word “plantation” often follows a specific crop, such as “cotton plantation”. For reasons I don’t know, combinations like “wheat plantation” or “fruit tree plantation” are extremely rare.

Sensible / Sensitive – a ‘false friend’ – the French word sensible translates to the English word “Sensitive”. “Sensible” (in English) means “full of sense”, usually in reference to a person or idea: “Her plans are sensible, and her nose is sensitive.”

Realise / Realized – often as “thing was realized by method” – this is another not-wrong-but-not-common word, like “global”. In English, anything can be realized, but the things most commonly realized are goals, dreams, and similar abstract concepts, the emphasis is on something that does not exist (a dream of a better world) becoming real (the world has become better). “Made” is a decent substitute, and if they fit, “completed”, “conducted”, “performed” are pretty good, too. If it makes sense to say “made real” then use “realise”.
            Realise is also often used in the sense of a new or previously-ignored piece of information coming to a person’s attention: “I realized the measurements were biased by condensation inside the instrument”.

Repeated / Repetitive – A measurement or other phenomenon with clear boundaries in space and time may be repeated. A task or activity may be repetitive, because it is boring, simple, and must be repeated many times.

Resilience / Resistance – Resilience is the capacity to survive or remain intact through some challenge, though damage may occur. Resistance is the capacity to completely stop the effects of a challenge, sometimes up to some point, after which comes (perhaps) catastrophic failure. “The weeds are resistant to the pesticide, and are resilient to drought.”

Species – the final S is never removed. One species, two species, some species, no species. This is also a problem for many native English speakers. “Specie” is a completely different word, it’s the generic name for coins and other objects used for currency and is very rarely used. The plural of genus is genera.

Vulgarise / Vulgarisation – in French, this term (vulgariser) is used to describe the way scientists and other technically-minded people explain their work to the general public. In English, the verb is very rarely used, but the adjective, vulgar, is used to describe things that are distasteful or disgusting, or actions and people that are rude or appealing to the worst elements of society. For example, a celebrity (actor, politician, etc.) that sexually harassed a member of their staff and then refused to apologise for this might be described as a vulgar person, or his behaviour as vulgar. It’s a synonym for “very rude”, usually with the unspoken implication of not being ashamed of this behaviour.
            There is no English verb in common usage that describes this activity the way the French verb vulgariser does, to my knowledge. In English, we might speak of “creating a plain-language version” or “rewriting for a general audience”; occasionally people will use the term “lay person” or “lay public”, a throwback to the difference between communications within the Church between clergy, and the communications from the Church to the congregation of regular people, the lay public. Essentially, “lay” is used to mean “anybody outside of my area of specialisation”.
            The words “disseminate” or “popularise” can work, but have some implications behind them – disseminate does not suggest any changes to the information (for example, to adapt it to a different audience), and popularise suggests some positive advocacy, as in attempting to convince the audience that this is a good idea, and encouraging the audience to spread it further.
           
Weigh / Weight – the verb is “to weigh” (without a T), the noun is “weight”. Compare “This weighs 3 grams” vs. “The weight of this is 3 grams”

Les autres problems communs

Contractions – Don’t. It’s distracting, isn’t it? They’re not used in formal writing. This includes formal emails and the like, such as when a student writes to a professor they do not already know well. English doesn’t have the requirement to elide words together over vowels the way French does.

Contrasts – It is very common to see expressions such as “In contrast”, “Inversely”, “Conversely”, “On the other hand”, “However” et cetera in scientific writing because often the author wishes to draw attention to divergent circumstances or results. Detailed usage notes are beyond the scope of this document, but some such expressions are best placed at the start of a sentence (e.g. “In contrast”) while others are best in the middle of complex sentences (e.g. “conversely”). “Inversely” is rare, “Conversely” usually works better.

Negatives and double negatives – most of the time, negatives are easy to use and double negatives are easy to avoid. Some words can be read as negatives in some contexts, so it is worth carefully reviewing complex sentences that include negatives in at least one clause. “Without”, “Instead”, “Beyond” and many others can fall into this category.

Possessives – English provides two common ways of indicating possession: the apostrophe-S or “the possession of subject” form that is similar to many possessives in French. For example, it is equally correct to describe the peat’s depth or the depth of the peat. I have been told that, for a Francophone, using the apostrophe-S “feels more English”. To an Anglophone like me, the apostrophe-S sometimes appears more informal than the X-of-Y form, but the X-of-Y form can appear awkward.

Since / Because – Because “since” can mean either “because of” or “in the time elapsed between then and now” it can be unclear when read; context determines which of these two sometimes quite similar definitions is intended. You may have been taught not to start a sentence with “Because” (I certainly was) but that rule is outdated and rather pointless. Because of shifting language use since the turn of the 20th century, and the potential for confusion with “since”, I suggest avoiding “since” and using “because”.

This / That / It – It can be difficult to translate French words such as Ça, Ce, Ces, Cette, Ceci, and Cela. Context is the main factor for determining which to use in a given situation. In scientific writing, “it” and “its” are less common than “that” and especially “this” – when in doubt, the pronoun you are looking for is probably “this”. Que and its variations almost always translate to “that” but of course there are many exceptions.
Were vs. Have been – The difference between these two forms of past tense is subtle. Things “were” different; this implies that the time of difference is finished, possibly with a long interval in between. Things have been different; this implies that some important event has happened to create the break between then and now.
            For example, when describing the results of another study, use “has been”, as in “Strong growth has been reported (Smith et al., 2010)” rather than “was”: “Strong growth was reported (Smith et al., 2010)”. This is a small point, and probably shows more about my personal opinions than about widespread practice in science. 

Plurals

Many English nouns are not changed when plural, or are considered plural only in unusual circumstances; these are mass nouns (e.g. “water”, with similar rules in French – « les eaux » est rare). Some are most often plural but have a valid, if rarely used  singular form. Many of these rules are not well known by native English speakers.
Sometimes, a number greater than one appearing in a sentence does not create a plural. Measurements that describe an object do not indicate more than one object, and can be considered singular descriptors equivalent to non-quantitative properties such as colour or qualitative descriptions of size (“large”, “heavy”, etc.). For example, a bog that covers 50 hectares could be described as “a 50 hectare bog” but not “a 50 hectares bog” because there is still only one bog. If the number and its unit could be replaced by a non-quantitative descriptor (“a large bog”, not “larges”), the unit is singular.

Data / Datum – A collection of data are “data”. A single point or measurement is a “datum”. A bunch of data put together in one place with some organisation is a “dataset”. The plural is “data”, so you can talk about “my data” or “these data” but not “this data” or “a data”.

Dice / Die – You can roll two or more dice, but if you have only one, you have a die.

Fish / Fishes – One fish. Many individuals of one species of fish are “many fish”. Several different species are “fishes”.

Research / Researches – Most of the time, what several scientists are doing would be referred to as “research projects” or “conducting different research activities” or some other way to avoid pluralising “research”. “Researches” is not wrong, it’s just so rare that it is disruptive when reading.
           
Information / Informations – the only example of multiple informations I can think of would be a situation in which two or more sources of information were competing or in conflict. “Information” is a mass noun, like “rice” (“rices” would refer to multiple strains or species of rice, or to multiple different rice-based foods).

Moss / Mosses – same rule as for fish, though because moss individuals are often difficult (or meaningless) to distinguish between, the singular is used for continuous properties: “A carpet of moss; Moss covered 90% of the plot.” But the plural is also widely used: “We identified three mosses and four lichens; mosses covered half of the plot”.

Les conventions Scientifiques

Active voice, as opposed to passive voice, is now widely preferred. The internet is full of discussion of the relative merits of each, but the consensus seems to be that active voice is easier to read: “We hypothesize that...” rather than “It was hypothesized that...”. Active voice is usually easier to write, too, and passive voice can be saved to emphasize particular concepts or procedures: “We discovered several problems with this method. It was developed under different circumstances than the current study, and has several drawbacks as a result.” The passive voice is still useful, for example, see the first sentence of this paragraph. Of note, passive voice avoids assigning responsibility or blame for actions, so you can use it to criticise some thing without explicitly criticising the person that created or used that thing: “The vehicle was left unlocked and a number of items were stolen”.

Adverbs such as “really”, “very”, “extremely”, et cetera are rare in scientific writing. They contribute little additional information and are not quantitative unless explicitly defined a priori as indicators of particular categories. “The water table was very high” conveys no information not present in “The water table was high”; even better than both of those is “The water table was 10 cm higher than the expected value”.

Biological species names are always italicised, and the genus name is always capitalised while the species name is never capitalised. Homo sapiens, Sphagnum fallax, etc. It is extremely rare to start a sentence with just a species name without a genus; when using species names in Every Word Capitalised titles, do not capitalise the species name: “This Paper Is About Sphagnum fuscum”. Higher taxonomic orders (Family, Order, Class, Phylum, etc.) are not italicised but are capitalised when used as a name: “Hominoidea evolved millions of years ago” and not capitalised when used as a descriptor: “hominids appear in the fossil record from millions of years ago.”
                    When a species name will appear more than once, you can abbreviate the genus to the first letter: S. fuscum. If several species are in the same genus, you can use this to avoid typing out the full name every time: “Sphagnum fallax, S. fuscum, and S. magellanicum”. If there are several genera with the same first letter, distinguish between them with additional letters: Carex canescens, Calluna vulgaris... Cx. canescens, Cl. vulgaris.

Chemical names can be written as their chemical formulae for all elements and most simple compounds: “CO2; H2SO4; Al” as long as the relevant rules about sub- and superscripting numbers are respected (the number of atoms is subscripted, the ionic charge is superscripted and is at the end of the name, minority isotopes are superscripted and are on the left of the relevant atom: 15NO2- ).
            Longer and more complicated names can be abbreviated by placing the abbreviation in parentheses after the first time the full name appears: “Deoxyribonucleic Acid (DNA)”. Though “DNA” is a special case because it falls into the category of abbreviations that are better-known than their full names; other examples include the bacterium Escherichia coli (E. coli), and the nematode Caenorhabditis elegans (C. elegans).

Citations are a large topic and mostly beyond the scope of this document. However, it worth discussing their general use in typical scientific writing. When using the very common name-and-date style, there are two main ways, either the author’s name is inside the parentheses (Brummell, 2018), or outside as in Brummell (2018). Starting a sentence with a citation is acceptable: “Brummell (2018) provides a large amount of useless advice”. Author-inside citations are most often at the end of a sentence (Brummell 2018) but can be placed between clauses or phrases (Brummell et al., 2018). It is rare to use citations in a form like “According to Brummell (2018), everything is terrible”; instead use forms like “Everything is terrible (Brummell, 2018)” or “Brummell (2018) suggests that everything is terrible”.
                    When using author-outside citations, be careful about how the work is described. "Brummell (2018) describes some examples, and found some conclusions." Note verb tense: “describe” is present tense even if the citation is old, “found” is past tense even if the citation is very new.
                    For numbered citations, follow the rules for the specific journal you are writing for; some will require square brackets [1] and may require the citations to be subscripted[2] or italicised or otherwise formatted in a particular way. Using a numbered citation to start a sentence can still be done, but those rules from the journal will become even more specific.        

Contractions are to be avoided in scientific writing. With a few rare exceptions, published papers do not include “it’s”, “can’t”, “won’t” et cetera. Apostrophes most often denote possessive, and that is also mostly rare. Note that the possessive for “it” is the exception to the apostrophe-for-possessive rule: its similarity to “it is / it’s” means the apostrophe is dropped. “Its” is one of the few possessives commonly found in scientific writing.

Figures and Tables must be mentioned in the text in numerical order. So, Figure 1 is mentioned in the main body of the text before Figure 2. It’s usually best to mention the entire figure before mentioning panels or parts of it, and panels / parts should be mentioned in their numerical or alphabetical order: “Water table varied over the growing season (Figure 2), with the highest levels found in Eastern area (Figure 2A) and the lowest variation in the centre (Figure 2C).”
                    Tables get a short descriptive title that does not normally include the word “table”. Similarly, don’t name a figure by the type of graph, such as “Figure 1. Boxplot of soil and air temperatures”. Just name it for the main variables or the part of the study it’s for: “Soil and air temperatures at the plots closest to the pond”; if it’s for the entire study, just the variables or the analysis is fine: “Model simulation output when vascular plants were excluded”. Figure captions go under the figure, and nothing goes above the figure. Tables can have a few lines of text explaining abbreviations or the meanings of symbols used in the table under the table: “*significantly different at p < 0.05; ** significantly different at p < 0.01”. One major exception for figure naming is multivariate statistical visualization techniques such as PCA and NMS – the figure title is often something like “PCA of species characteristics from the primary study site” because without some guidance the output of a wide range of very different analyses all look the same.

           Your data are used to create your figures and tables; figures are visualizations of data, and are used to aid interpretation of data. Do not interpret your figures or tables, interpret your data and refer to your figures and tables in this process. “According to Figure 2, Blue is more abundant than Red” is incorrect. “Blue is more abundant than Red (Figure 2).” is correct. You created your figures and tables, you did not discover them buried in a peatland!

Fonts and style choices are up to you, but some journals specify a particular font or short list of fonts they prefer; often this includes very widely used choices such as Times New Roman, Calibri, and Arial. You can read up on the differences between fonts, especially the split between Serif and Sans Serif fonts if you’re interested. Whatever you choose, be consistent. All text throughout a document should be in the same font, including the text within figures and tables (e.g. map legend, axis labels, numbers). In MS Word, you can select the entire document with CTRL-A, then set the font from the drop-down menu on the Home Accueil tab.

Formality – Scientific papers tend to be written in a highly formal style. It is easy to go too far, and write something that resembles a historical document from centuries ago; old-style English looks more formal (forsooth!).

New names are one area where most other rules do not apply. You are free to name your study sites, novel equipment created by you, and other unique items anything you like. This makes a good joke (well, good to a certain sense of humour) when particularly clever, especially if the name can be linked to an usual citation or an obvious pun (or both).

Numbers are written using the numerals unless 1) the quantity is a whole number between one and ten, inclusive or 2) the number is the start of a sentence: “Five hundred people attended the concert.” For numbers that are immediately followed by a standard unit (g, cm, mol, s-1, etc.) or include a decimal (3.8; 0.99), use the numerals. There are some borderline cases, mainly dealing with time – years, days, minutes. In general, if the number and unit are only going to be used together once in a paper or chapter, follow the above rules: “Five years”. If the unit is included in any calculations or combined with other units, use the standard abbreviation and use the numerals: “5 kg yr-1”. Always use the numerals in figures and tables.
            Scientific notation is preferred. Microsoft products have introduced the convention of using a capital letter E to denote a (base 10) exponent. This is not suitable for scientific publications; show the base, and the exponent is superscript: 4.8 x 103; 1.99 x 10-6. Place the decimal after the first digit: “17.4 x 106” is incorrect, it should be “1.74 x 107”. SI units have standard prefixes for every third exponent level – K, M, G (for 103, 106, and 109), m, μ, n (for 10-3, 10-6, 10-9) but mixing these prefixes with exponents is unusual: “1030 μg” or “1.030 mg”, rarely “1.030 x 103 μg”.

Parentheses ( ) are used for in-text citations in many journal citation styles (Smith and Wong, 2016), to separate measurements within tables and figures (± SE), and within mathematical formulae, though sometimes square brackets [ ] are used in these ways. In scientific writing it is extremely rare to use them to make a tangential point (or explain a detail, or provide an example) within a normal sentence. Most in-line tangents (like here) can be easily replaced by commas: Most in-line tangents, like here, can be easily replaced by commas.
            Parentheses break up the flow when reading; many readers are in the habit of skipping everything inside parentheses because they usually indicate that the words inside the parentheses are unimportant details of interest to only a tiny minority of readers (that is, most parentheses contain citations, and I will look up only one or two citations in most papers that I read). Placing a single word within a parenthesis pair (example) is disruptive and usually does not help to explain the concept. Just write normal sentences (please).

Proof is extremely rare in science. This is a matter of the philosophy of science; under strict Philosophical Materialism, nothing can be proven, but hypotheses can be disproven. Avoid using the words “proof” (noun) and “prove” (verb) and their variations (e.g. “proven”). Even the word “disprove” is rare, because most authors do not spell out the results of their experiments in such terms, they usually assume the reader can follow along with the line of evidence that conclusively favours one hypothesis over another.
            Previous studies provide evidence or found interesting results, they did not prove your point for you.

Quotations are rare in scientific writing, but common in other academic disciplines. Using a quotation indicates the exact words are important. Using a citation indicates the concept is important, not the exact words used to express it by another person. Because in science we usually care more about the concept than how it was first described, we cite information presented in our own words. English uses the superscript, double-inverted-commas style for indicating quotations, while other languages have their own conventions. The use of non-“English” quotation marks – even for quotations in another language – is ‹‹ highly disruptive ›› “when reading„.

Scientific instruments and specific materials can be mentioned in a scientific paper, typically in the Methods & Materials section. The name of the manufacturer (not the brand) follows inside parentheses, with the city and country where that manufacture (or its head office, for large multinational companies) resides. Do not use ©, ® or TM to indicate the copyright or patent status of an item or idea, credit the copyright-holder or inventor by naming the person or company that produced it – the point is to show other scientists where they can get similar materials in order to replicate your work. Use the current full legal name of the corporation: “Thermo Fisher Scientific”, not “Fisher”; “MilliporeSigma Canada Co.” not “Sigma” or “Sigma-Aldritch”. Sigma is a tough one, their corporate structure is unclear. If you bought something from their Canadian distributor, it’s “MilliporeSigma Canada Co.”, if you paid in American dollars, it’s “Sigma-Aldritch Corp.”.

Slang – Avoid slang if possible. This includes common expressions and the broad array of words that might be considered excessively informal by some audiences. Much of this comes down to personal style. If you think you can get away with it, it can be fun to sneak a bit of slang into a manuscript. “On the other hand”; “Top to bottom”; “Back of the envelope” and many other expressions are in a grey area of mostly-informal expressions that can help ameliorate otherwise boring subject matter.

Statistical significance is another large and complex topic, but there are a few common mistakes that can be addressed here. “Significant” is a special word in scientific writing, and almost always comes with a specific and clearly-defined numerical threshold, often (but certainly not always) p < 0.05. When reporting results of statistical tests, do not just say “statistically significant” or “significantly different”, instead describe the direction of the difference (“Red was significantly bigger than Blue”) or the effect size when appropriate, using the p-value (and other necessary details) of the test to allow the reader to evaluate significance: “Blue was 20% smaller than Red (paired t-test, p=0.045, n=30)”.
            Differences, patterns, and other results may be significant, but treatments, experiments, and procedures are not described with that word. “The control treatment was not significant” is a meaningless statement. “The was no significant difference between the control and low-dose treatment” is correct. The p-value you are using to evaluate significance is associated with a specific statistical test, not the structure of the data the test is applied to. As an aside, an experiment with three treatment levels (e.g. low, medium, high) plus a control (zero) has four treatments.
            “Trend” is similarly a precisely-defined statistical term, and always means “A non-zero slope” and is most often applied to something like a regression or an ANOVA with treatments that can be organised in some obvious semi-quantitative way, such as increasing levels of addition of some substance. It is not correct to describe something as “a non-significant trend” or “a trend that was not significant” – if it’s not a significant relationship (with a p-value smaller than your previously-defined alpha), it is not a trend.
            Similarly, statements like “Blue seemed to be larger than Red” are meaningless. If the difference is real it will be significant and you will have the p-value (and other details) to show it. If the difference is not significant, it is not real.

Software can be mentioned either as any other commercial product (“Scientific instruments”, above) or in some cases with a paper citation. The software package R is a good example of the latter (R Core Team, 2013). SPSS (IBM, Armonk, NY, USA) is treated like any other scientific tool that was purchased. You do not need to cite basic, widely-available software used to organise, analyze, and visualize data; don’t cite MS Excel or the drawing program you used to make a figure unless there’s something very special about how you used it (such as a custom-made macro or plugin) or there is a special consideration about using that software to perform the tasks you used it for – the big example here is complex statistical analysis in Excel, which many scientists will tell you to avoid. Different statistical software programs have known concerns with some types of analyses, which is one reason why the stats software is almost always mentioned. Also, the community of R users are very fond of telling the rest of us about the program.

Tense of verbs can vary throughout a document, and within paragraphs as needed. However, in general verb tense should not change within a single sentence, though of course there are exceptions. Typically, the Introduction is a mixture of present and past: “This remains an open question, while previous studies in this area have shown that... “ Material and Methods is entirely in past tense except a few sentences that may explain the consequences of particular choices in either present or future tense: “Without this control, water levels can rise beyond safe limits”. Results is typically entirely past tense as well: “Red was significantly larger than Blue”. Discussion sections have the most freedom in this regard, with paragraphs often switching from past “Prior experience led us to hypothesize that... “ to present “Blue is bigger than Red” and to future “... Blue will continue to grow bigger than Red” even within single sentences. As usual, consistency is key, if you choose to write in a particular way it’s best to continue that style where appropriate.

Units are placed after the numbers, either with our without a space; be consistent is the rule here, if you leave a space for “2 cm”, also leave a space for “400 g” and “3.8 x 10-6 mol”. In English, the . period is used for the decimal place, not the , comma.
            There are a few exceptions, for special units that are placed before the number. In scientific writing, isotope numbers are placed before the atom they modify in chemical formulae: “15NH3”; when spoken, the number is said after the number: “N-fifteen”.
            The dollar sign $ is the most common symbol to appear before the number in non-scientific writing (along with the symbols for other currency, such as £ (UK pound), ¥ (Japanese yen), € (euro), and regular Latin-alphabet letters used for some currencies like the South African Rand, R, and the rarely-used symbol for cents ¢ is placed after the number like most other units; if you use ¢, do not include the leading decimal “35¢” not “0.35¢” (unless you are actually talking about fractions of a penny) and only use cents for quantities less than one dollar; for a pile of change, spell out the word: “I found four hundred and thirty-three cents under the couch.” One situation where this might come up in a scientific paper is if a photograph includes a coin for scale: “One of the plots in our study with 25 ¢ piece for scale.” In peer-reviewed publications, the audience is international so many readers will not be familiar with common Canadian or American coins; a scale bar is preferred.

Vague, qualitative terms like “several” or “some” are best avoided; use actual numbers instead. Semi-quantitative words like “majority” or “negligible” have built-in assumptions (“more than half” for “majority” and “too few / too small to be important” for “negligible”) and are usually acceptable. If in doubt, use a number. This can be a simple quantity (“Some” becomes “Seven”) or a fraction or percentage (“Most” becomes “30 out of 40”).

Thursday, April 12, 2018

The Surprising Breadth of a PhD


I recently served on the committee of a PhD student who defended their Proposal, near the end of the first year of the PhD. The student’s Proposal ended up being rated less-than-satisfactory, despite a letter grade for the graduate course somewhere in the A- / B+ range. The major reason for the requirement to edit the document and add material was a perceived lack of “thinking at the PhD level”. This is a hard to define yet widely-agreed phenomenon among the professors I have spoken with, and that attitude has certainly percolated down to post-docs and PhD students and other members of the Academy as well. I do not disagree with it, generally, though I expect to continue to argue minutiae about what is and is not included in any given specific case.

Rather than trying to do that for either boring hypothetical cases or clumsily attempt to maintain anonymity for real cases, I’d like to talk about a related issue, that of the surprising breadth of a PhD. I myself was surprised to discover that core competence and skills development as directly related to my PhD project was necessary but not sufficient for a PhD. There are obvious requirements at the start of a PhD: learn the skills for the methods, learn the knowledge of the relevant current and historical literature, collect the data, complete the analyses, write. Some aspects within that list become clear through time and are not surprising, such as requirements to gain fluency in certain software programs, or to be able to visualize one’s data in useful and insightful ways. Plus the never-ending quest to improve one’s writing abilities.

The surprise – and this is universal among PhD students in my experience – is the requirement for skills and activities (and effort and time and capacity to discuss) far outside of one’s project. “Leadership” abilities, which are extremely poorly defined and vague. “Well-rounded” qualities, which almost always appear to be irrelevant trivia or useless distractions from the “real” work. Qualitative judgements rather than quantitative evaluations, both of and by the student. And a wide range of so-called “soft” skills that go so far beyond “don’t screw things up for your labmates” and “don’t piss off your professor”.

The first reaction to this surprise, coming as it usually does on the heels of some negative evaluation, is a mixture of anger and denial. What the hell does pondering “big questions” have to do with my measurements? No, I disagree! I study X, which is completely unrelated to Y. And so forth. I’m not going to argue that everything dropped on a student in a difficult and emotionally draining committee meeting is important for this nebulous demonstration of “PhD thinking” but I do argue that some of these things are important.

Start with the negative, to get it out of the way. I have yet to read a philosophy of science piece – book, blog post, newspaper article, whatever – that I have found interesting or useful. There was a philosophy of science book on my reading list required for my first attempt at a PhD, part of my assigned work prior to my Comprehensive Exam. I read it, because it was assigned, and I took notes and tried to read it carefully because I expected to be asked questions about it during the Exam. I can’t remember if any questions directly related to that book were asked or not, but I do remember not being impressed by the book. The author spent almost the entire time discussing hypothetical situations that Galileo might have found himself in, and how his invention of the Scientific Method would have translated into some chain of logic or series of actions that this person who died several centuries ago might have carried out. It was long-winded, even at less than 200 pages, and felt entirely irrelevant. My feelings on that have not changed, but I think I’ll save dumping on philosophy of science for another time. 

On to the positive, then. The actual relevance of leadership skills and other away-from-project activities was explained to me by my PhD advisor in a context that made their utility immediately clear: scholarship applications are evaluated in a structured, pre-defined way that includes significant weight for such things. I did some activities that I found enjoyable in any case, and then happily discovered that writing about these activities was a good way to fill in a useful section on scholarship applications; I wrote a paragraph about helping to bring a public speaker to a locally-hosted conference, and another paragraph about some of my photos that have been published in a few places. I believe these two paragraphs, and others, were instrumental in my successful application for the NSERC CGS-D scholarship I was awarded.

A few years ago, Jeremy Fox requested more advice given to people at earlier stages of an academic career. I suspect he was mainly thinking of his faculty colleagues, but I just read his piece today and this concept of a surprise inside every PhD occurred to me based on my recent committee experience, which was interesting for a great many reasons beyond this.

Friday, April 06, 2018

The Death of the Scientific Paper


I’m sitting in my office at Université Laval, waiting for an opportunity to speak with my professor, and procrastinating revising a manuscript. My procrastination, almost always, is to read the internet, and today I’ve found a new article from The Atlantic, “The Scientific Paper is Obsolete”. 

The main thesis of this article is that the scientific paper as we know it today has outlived its utility. The author, James Somers, opens with a description of the niche the scientific paper was invented to fill: a short, incremental advance published as widely as a book but as readable as a letter, and permanent where a lecture is ephemeral. I’ve had conversations with academics in social sciences or humanities disciplines who express their surprise that books, which for argument’s sake are publications longer than about 100 pages, almost never appear in the list of citations in my scientific publications. I list 11 publications – scientific papers – on my C.V. with me as an author (always one of several, I have no sole-author publications) and I’m first author on 7 of those; this means I did most of the actual writing. I feel this experience gives me some perspective to evaluate the article in The Atlantic.

There are the expected jabs at the style and perceived readability of scientific papers, a criticism so widespread and consistent that I now mostly ignore it. I get it, you don’t get the enjoyment of reading a scientific paper that you get out of reading something else, and you put the blame largely on the abundant jargon and dense prose of typical scientific papers; James Somers also adds some mentions of “mathematical symbols”, which is indeed one major feature of many scientific papers that separates them from written works intended for a wider, non-specialist audience. But that’s the point – the intended audience of a scientific paper is not the general public, it’s other experts in that discipline. Know your audience. I guess James Somers does - scientists and non-scientists decrying the difficult prose of scientific papers to non-scientists is very popular in popular science articles.

This isn’t to say that a scientific paper cannot be or should not be highly readable to non-specialists and other members of  the general public, but to approach a scientific paper as a non-specialist and then complain about the jargon is to miss the point. I think one has to approach a scientific paper from a position of self-knowledge, in that I have to read a paper outside my area of expertise in a different (and more difficult) way compared to reading a paper that might cite my own work.

Another major difference between a scientific paper and something like an article in The Atlantic – and these two categories are of similar word-count, on average – is the abundant citations in a scientific paper. Every fact, every suggestion, every piece of information in a scientific paper that is not derived directly from the study itself will be cited; credit is given to the prior work that established those facts or provided those suggestions (unless the fact or suggestion is obvious or already widely known and established; we don’t cite Scheele and Priestly (1772) when talking about oxygen, for example). I find myself wishing for some citations and outside attributions while reading this Atlantic article because James Somers makes so many claims that I would like to dispute.

For example, here’s the third paragraph of the article:

The more sophisticated science becomes, the harder it is to communicate results. Papers today are longer than ever and full of jargon and symbols. They depend on chains of computer programs that generate data, and clean up data, and plot data, and run statistical models on data. These programs tend to be both so sloppily written and so central to the results that it’s contributed to a replication crisis, or put another way, a failure of the paper to perform its most basic task: to report what you’ve actually discovered, clearly enough that someone else can discover it for themselves.

 Are papers really longer in 2018 than they were, on average, in 1998, or 1978, or 1888? Are they more “full of jargon and symbols”? Are the majority of analytical computer programs “so sloppily written”?
And what replication crisis? Mr. Somers, have you not read the recent counterargument to the crisis-in-science narrative by Dr. Fanelli, recently published by PNAS?  

Moving on, one major criticism is that scientific papers are not a good way to express and describe complex results. Animations, something computers are quite good at, are useful tools for visualizing such complex concepts but are very difficult to express on a static sheet of paper, which the modern PDF (Portable Document Format) emulates. I agree, but I do not agree with the follow-up point that this renders the PDF hopelessly useless. A scientific paper is about the words, not the pictures or other visualizations. It’s about the information. Expressing that information in a way the audience can understand and use is the key skill of writing a scientific paper, and is distinct from the skills that create written material intended to be read by as wide an audience as possible. A scientific paper relies heavily on absolute honesty, and presenting all of the available and relevant information to allow the reader to independently decide to agree or not with the author’s arguments and conclusions. A magazine article pushes a particular interpretation of some phenomenon. A scientific paper pushes the phenomenon and then describes one (or sometimes more) possible interpretation of that phenomenon, usually in light of similar phenomena and potential alternative interpretations. A graph is not data, it's an expression of data. An animation is not an argument, it's one support for an argument.

Visualization is a technique, a way to take obscure numbers and show the patterns they contain. I struggle with it, constantly. The paper I am procrastinating working on right now has some decent figures* in it and I don’t see a need for a great deal of work on the visualization side of this paper. I have another project I’m working on that is at a much earlier stage and my current activities there are primarily concerned with visualization. I’m at the “data exploration” stage, where I throw the metaphorical spaghetti of the data at the metaphorical wall and see what sticks. That means lots and lots of images, mostly graphs I get my computer to make for me, and some scribbles on paper in my notebook.

*A figure is any image in a scientific paper, a photograph or map or, most commonly, a graph illustrating the mathematical relationship between two or more parameters. I tend to write papers by making the figures first, but that's a personal style and subjective workflow thing, and certainly not universal among scientists.

Back to The Atlantic

It’ll be some time before computational notebooks replace PDFs in scientific journals, because that would mean changing the incentive structure of science itself. Until journals require scientists to submit notebooks, and until sharing your work and your data becomes the way to earn prestige, or funding, people will likely just keep doing what they’re doing.

This is more interesting to me than the preceding description of competing formats for “computational notebooks”. I have seen suggestions from other people that concentrate on changing other aspects of scientific publishing, often the abolition of for-profit publishing companies (e.g. Here), but these suggestions and discussions do not express a dissatisfaction with the basic unit of scientific communication, the scientific paper. What would my job look like if both scientific papers and the way in which they are disseminated were to go away? Would I just be uploading lumps of code and datatables to some institutional server, whenever I feel like my analyses have answered some tiny question? Does my "Literature Cited" section just become a link-dump?


“At this point, nobody in their sane mind challenges the fact that the praxis of scientific research is under major upheaval,” Pérez, the creator of Jupyter [one of the competing calculation notebooks – MB], wrote in a blog post in 2013. As science becomes more about computation, the skills required to be a good scientist become increasingly attractive in industry. Universities lose their best people to start-ups, to Google and Microsoft. “I have seen many talented colleagues leave academia in frustration over the last decade,” he wrote, “and I can’t think of a single one who wasn’t happier years later.”

I had to look up the definition of “praxis”; I think it’s exactly what I was talking about, what does my job look like if the scientific paper and scientific publishing are drastically changed? Dr. Pérez apparently thinks my job would not change much. I’m not so sure.

There’s also a problem in that paragraph with a possible logical fallacy: confirmation bias. Lots of sad people leave, and then you find a few of them later and they’re happier. Well, good! Happier people is a good thing. But to then claim that it was the act of leaving that made them happier, and then extend that by implication that everybody should consider leaving, is to stretch beyond the available information into unsupported (and idealistic) speculation. If the only people who left were the unhappy people, then what about the happy people who stayed? Would they have also become even more happy had they left? Did the people who stayed unhappy, or became more unhappy after leaving avoid talking to you?

At this point I’m wandering away from the discussion about scientific papers. And I think the article did, too. It concludes with a weak suggestion that maybe some new tools will be useful (who could disagree with that? Tools are useful by definition) and that, hey Galileo, right?

I remain unconvinced in the impending death of the scientific paper. What I got out of this article was a description of some computer programmers and physicists with generally poor social skills but good ideas and skills related to generating and analyzing data. And that somehow this means the time I spend teaching ESL graduate students how to write better English that is also in the demanding, highly technical style of current scientific communication is somehow wasted.