How to read a research paper

By Charles Sutton on November 4, 2017

There’s lots of advice you can read about how to read a research paper. There’s some good advice in this paper:

S. Keshav. How to read a paper. ACM SIGCOMM Computer Communication Review, 37(3):83–84, 2007.

But there’s one tip that I can offer you to organise your reading of a paper that I can’t remember seeing elsewhere. Ask yourself:

What is the 5 minute summary that you would give to a Very Smart Friend?

I don’t understand a paper until I can explain the paper to a smart person who hasn’t read it. I need to be able to explain enough to the VSF so that she understands: what problem the paper is trying to solve, what sort of methods does it use, and how does it relate to the literature, i.e., what does it add.

But there’s two rules.

Rule 1: You have to use your own words, summarising the paper without looking at it. If you find yourself repeating sentences from the paper, the you haven’t internalised the paper’s message.

Rule 2: You cannot take anything the paper says at face value. You can assume that the authors won’t lie to you. But they might oversell a bit, and if you are a independent expert, you might not agree with everything they claim, or with how they interpret the new evidence that they have provided. Or you might be able to describe what’s going on a little bit better than they managed. What do you think that they have shown?

Another way of saying this: I know that my imaginary Very Smart Friend will jump on me if I say something inaccurate. So I don’t want to make a claim to my iVSF unless I can argue for it, based on what I have learned from the theory and experiments in the paper. If I just say something like “well, the authors claim X,” but X is controversial, or even dubious, then my iVSF will immediately want to know why they say that, do they really have evidence, and I had better have a answer.

It can also be good to try this exercise even before you are done. After reading the introduction, how well can you guess what the methods will be, even before you read them? Then read to see if you were right.

To sum up, I hope that I’ve convinced you that having an imaginary friend can help you in your research. You might not want to tell everyone on the internet that you have an imaginary friend, as I have just done, because it might not improve their respect for you. But hey, if it’s good for your research, then where are your priorities?

My special-est email folder

By Charles Sutton on October 7, 2017

I’ve tried a lot of different methods to organize my email. None of them work.

After many years of trying, I’ve come to the conclusion that the Gmail solution of “put all the email in one bucket and search” is the way to go. Over the years I have gotten very, very good at searching my email. But that’s a topic for another day.

Today, I wanted to mention a special email folder that I have. Maybe it’s the most important folder I have, even though I might only look at it once a year, if that.

What could be so special? Emails to pick me up when I’m feeling down. You know which ones I mean. The emails that contained job offers, promotions, grant awards, or even just kind words from trusted senior colleagues. I archive emails like this, emails with special professional successes, in their own folder. Then every once in a while, if I’ve been dealing with a difficult situation or am having a particularly bad bout of impostor syndrome, I can open up the folder and look at it just for a minute, and then I feel a little bit better.

You might ask why these are professional emails. Hopefully I’ve had happy things happen in my personal life as well? Don’t be too worried. It’s just that my personal life doesn’t usually happen over email. That’s what my wedding photos are for!

I got this idea from a blog post somewhere by some other professor, but I don’t remember who or where. If you do, please let me know in the comments!

Update: Rob Zinkov has kindly reminded me that I learned about this idea from the excellent 7 year postdoc blog post by Radhika Nagpal about life for new faculty members. Thanks!

Tags: advice, email

Making unblinding manageable: Towards reconciling prepublication and double-blind review

By Charles Sutton on September 19, 2017

There has been a lively debate recently about the review process for research paper submissions, and how to deal with the fact that double-blind review becomes more difficult when many papers are prepublished on sites like arxiv.org before submission.

This discussion is becoming increasingly important, as we have conducted a study which indicates that in 2017, 23% of top-tier CS papers were posted to arXiv. (That figure includes papers posted both during and after peer review.)

I’m going to start from two assumptions: double blind reviewing is good and prepublication is good. You can disagree with either assumption, or you could think that double-blind is so much more important than than prepublication that it should be preserved at all costs, or vice versa. People hold all of those views, and it would take an even longer post to pull all that apart.

Instead, I’d to think about how to reconcile these two assumptions, because I do believe them both, and how to obtain an engineering trade-off that aims at most of the advantages of both, most of the time.

A lot of people have said that allowing papers to be prepublished anonymously would be a good compromise. An appealing idea, but I worry that it may be a bad one. Instead, I’ll argue that a good compromise is this: Accept that papers will be de-blinded, but design the double-blind review process to compensate.

Perhaps the underlying point is that the conflict isn’t black and white. For double blind to work, it’s not necessary for 100% of the submissions to be unblindable, i.e., have their author identities be undiscoverable online. I might even suggest that it’s possible to have effective double blind when all author identities are available online. Just because a paper is unblindable does not mean that the reviewers are unblinded — perhaps they have not seen it, or perhaps they saw it in an email with 100 other papers and don’t remember having seen it.

What shouldn’t we do?

There are some recommendations that I’ve seen that unfortunately I don’t think will work.

Anti-Recommendation 1: I know. Let’s have an anonymous version of arxiv.

Lots of people have suggested allowing authors to prepublish papers anonymously (incidentally, there are amusing precendents for this in the history of mathematics). This could be implemented via an overlay of arxiv, or a new feature added to arxiv itself, that would allow authors to temporarily hide their identity. Let’s call this AnonArxiv.

Submissions to AnonArxiv would be immediately available to all but without the author names. Then, once the paper is accepted, AnonArxiv would reveal the author names, while preserving the time stamp of the anonymous submission. The conference would then require that if submissions are prepublished, they must be prepublished anonymously; any other prepublished submissions will be rejected without review.

I used to think this was a cool idea. Now I don’t. It neglects a fundamental principle that we are sadly all familiar with, that most papers are rejected.

Let’s say I post a great paper to AnonArxiv and submit it to ACL. Like most papers it is rejected. I’m convinced that the reviewers have made a mistake, and so I want to resubmit it to EMNLP. How do we handle this?

We could (1) require resubmissions to remain anonymous. After rejection, I must choose whether to unblind the submission on AnonArxiv, in which case it cannot be resubmitted to other conferences, or whether to keep the submission anonymous, in which case the paper could spend a year-plus as anonymous, until it is finally accepted. This seems an unreasonable choice to force onto authors.

Or we could (2) allow authors, after one rejection, to unblind their AnonArxiv submissions and resubmit to a future conference. This has the benefit that papers only spend a few months as public-but-anonymous, which is not so bad. But I’m not sure it works. For one, this is difficult to enforce, because apart from the honor system, the information about whether a paper was previously submitted is confidential (keep in mind that the original submission might have been outside the NLP community). But more fundamentally, what would we do for first-time submissions that violate this rule, reject them without review? How would we justify doing that when there are many second-time submissions whose authors are already public?

This would also mean that if I submit a paper to a conference outside of NLP which allows prepublication and get rejected, I would not be able subsequently decide that an NLP conference would be a better fit, and resubmit there. It might be possible to implement a special dispensation in this case, though.

With some regret, I come to the conclusion that AnonArxiv won’t work. That said, AnonArxiv variant (2) might work if a large enough percentage of submissions were first time submits. Then we’d have the majority of the papers on AnonArxiv, and hence unblindable, which might be good enough.

Perhaps Anti-Recommendation: Require prepublication to be declared

ACL 2017 required all submissions to declare if they had been prepublished. Reviewers were notified that a paper had been prepublished. Prepublished papers that were not declared as such were summarily rejected. Unfortunately I don’t understand the rationale for how this stringent requirement was meant to help. Remember, the goal is not to prevent all papers from being unblind-able, it’s to prevent too many papers from being unblind-ed.

This could be a good idea if the hope is to warn reviewers that they should be careful about searching the web for the paper’s title during the review process. The problem with this idea though, is that it does not help if the authors very reasonably prepublish the paper just after submission. So really, all reviewers need to be careful, all the time, and the extra heads up maybe isn’t too helpful.

If the idea was to simply to gain more information about how many papers are prepublished, then I totally agree with asking the question, but I do not see why penalties for non-compliance were necessary.

If the idea was to handle prepublished papers differently in the review process than non-prepublished ones, then I am not sure why this is necessary. Instead, I’ll advocate below that we handle the review process of unblinded papers differently.

So I would argue that it might make sense for conferences to request that authors declare prepublication, but that no penalties for noncompliance be used in future years.

Recommendation-But-That’s-Actually-An-Orthogonal-Issue: Let’s use OpenReview.Net

I’ve also read the suggestion that the NLP community switch to running conferences on OpenReview. I love OpenReview, and I eagerly await the day when I can go onto a site like OpenReview and pull up any paper in computer science, from any venue, from any year past or present, and find a lively and informative discussion online.

But OpenReview is a software platform, not a reviewing process. It’s specifically designed to allow conference chairs to configure what information about the reviews and authors should be made public and when. It’s not designed to answer the policy questions about whether submissions should be anonymous and what happens after they are rejected.

All right, wise guy, so …?

One way to square the circle is to try adapting reviewing norms to adjust and compensate for the fact that it is more likely for papers to be unblinded.

Conferences are already doing these things, so I don’t claim to have any new ideas. But I think it’s useful to bring together the arguments for these ideas, rather than having programme chairs have to reconstruct these arguments for themselves every year.

Recommendation 1: Clarify Norms for Reviewers

Even if the author information for all submissions are public online, then reviewers, area chairs, and programme chairs can take steps to reduce the chances that a submission is unblinded, and to minimize the consequences when one is.

Reviewers should avoid making Web searches that would be likely to reveal the authors of the paper. It can often happen that a well-meaning search for related work inadvertently turns up an unblinded copy of the paper. I am not saying that reviewers should never search for related work, but it carries risks — it always has (ten years ago I had a reviewer of one of my papers deblinded by a tech report) — and reviewers should try to avoid it.

If the reviewer feels that a Web search is necessary, they should hold off until they have read the paper completely and formed an initial impression of it. This allows reviewers to apply the bias of cognitive dissonance to counteract the potential bias of unblinding.

If reviewers learn the author identities, then they must let the relevant person — who could variously go by the title “programme chair”, “programme committee member”, “meta-reviewer”, “senior PC member”, “area chair”, etc; I’ll use the term “area chair” (AC) — know this right away.

Area chairs should be prepared to apply their judgment to weigh the reviewers’ comments differently when some reviewers are unblinded. Consider a paper like this: it has two negative reviews and one positive review, but the positive reviewer has been unblinded, the paper comes from a famous group so there is possibility of unconscious bias, and the AC believes that the negative comments have merit. Then the AC should be prepared to give less weight to the positive reviewer. In other examples, perhaps all three reviews are positive, or the authors are lesser-known, and so unlikely to engender positive bias. Then downweighting unblinded reviewers may not be necessary.

Programme chairs should carefully write their instructions to reviewers and area chairs to make these expectations clear. They should also be prepared to assist ACs with borderline cases where there is possibility of bias.

We don’t know if these steps alone will reduce the percentage of unblind-ed submissions to an acceptable level. For example, if the percentage of unblinded reviews reaches, say, 80%, this recommendation becomes more like a band-aid that would be unlikely to preserve the benefits of double-blind review. Which brings us to the next point.

Recommendation 2: Measure and Monitor

Much of the heat around this discussion may be because we are, as it were, debating in the blind. It is not difficult to gather more evidence than we have now:

  • We should be able to measure and publicize the percentage of submissions and accepted papers which are prepublished.

  • Although a bit more delicate, we should be able to estimate the percentage of submissions which are not first time submits.

  • If we follow recommendation 1, we should also be able to monitor the percentage of reviews which are unblinded and the percentage of submissions which have had 1, 2, or 3 unblinded reviews.

  • We should also attempt to record measures of diversity in the accepted papers in terms of authors and institutions. We should keep tracking those measures, monitor for decreases, and the presumed negative correlation with percentage of unblindedness.

Updated 8 May 2018: In discussion after this post was first published, a colleague pointed me to the PLDI FAQ on double blind, which has some good ideas. Another idea which I am starting to see gain traction is to have a “blackout period” in which authors are expected not to post their papers on arXiv or social media starting from a month before the conference deadline, and continuing throughout the review period. This is another interesting compromise which seems to nicely handle the resubmission problems of the AnonArxiv approach.

The not-so-great checked vs carry on debate

By Charles Sutton on September 2, 2017

This time, some advice of a lighter sort. How light? Depends how well you pack!

I hear a lot of frequent air travelers say that you should never check bags if you can avoid it, but I think it’s not always clear cut.

I do usually travel with only a carry on. This has a lot of advantages:

  • You don’t need to queue at the ticket counter to drop off bags. If you’ve checked in online, using your phone is easiest but printing your boarding pass is still good, then you can go straight to security and the gate.

  • You don’t need to wait at baggage claim on arrival. This is an especially big win if you are arriving late at night.

  • It’s easier to switch flights at the last minute. You might wonder why you’d want to do this if you’ve planned ahead. You’ll stop wondering when your first flight arrives early and you wish you could jump on an earlier connecting flight. Or when your first flight is delayed but you still have a chance to catch a different connecting flight. I’m still annoyed at the time United Airlines prevented me from doing this by checking my bag at the gate unnecessarily.

All that said, for some trips it can be more convenient to check your bags, even if you could squeeze into a smaller bag. Why?

  • You can bring a bigger bag that holds more. (duh)

  • You don’t need to lug the bag around the airport. This removes stress during a long connection.

  • You can fill up your checked bag with liquids and gels. This is especially useful if you’d like to bring gifts of food and drink back to your friends, family, or perhaps to a select group of academic bloggers who you particularly admire.

  • If you are running a few minutes late to the gate, the plane is more likely to wait for you, because if they don’t, they need to find and remove your bag from the hold.

For direct flights I think it’s clear that carry on wins most of the time, but if you have a connection then it may well depend on your itinerary.

Tags: travel tips

A first lecture on time management

By Charles Sutton on August 5, 2017

Several students have recently asked me for advice about time management. When people ask you a important and difficult question like this, usually the best thing is to think of someone else who can give a better answer than you. For time management, an obvious person to turn to is the late Randy Pausch, a noted computer scientist who became an internet sensation because of an inspirational lecture that he gave after he had been diagnosed with terminal cancer.

Today I’d like to recommend a different, more practical, and excellent lecture that he gave on time management:

You should start by watching the lecture. I found it helpful to have the slides and bullet points in separate windows on screen as I watched.

If you need further encouragement to watch the lecture, I can say that what amazed me, as poor at time management as I may be, was the number of suggestions that I actually use day-to-day. I had entirely forgotten just how much I had learned the first time I saw this material. For example:

  • Find your most creative time of day and defend it ruthlessly. The most important tip that you can learn for creative work. (I was interested to learn later that your most creative time can actually change over the years.)

  • Turn off your email notification sound. Do it now!

  • Learn ways to say “no” gently. It makes saying “no” easier.

  • Don’t ever delete emails. Search instead.

  • Keep a todo list and calendar. My today-self says, “Well, duh.” But there was a time when I didn’t. You should start now!

  • Get a speakerphone so that you can get other things done while on hold with customer serve. Now, mobile phones or even Skype has a speakerphone option that works for this.

  • Professors should keep Kleenex in their office. Yes, we all need this.

  • Keeping a clock on the wall behind where your visitors sit. Much less obtrusive than a wristwatch.

  • Write down one-minute minutes. So helpful! Just the main takeaways and actions for next time.

  • Keep a time clock. A big topic that I hope to write more about later.

There’s also an equally long list of good advice that I had forgotten, but for that, you’ll need to watch the lecture yourself!