A Research Rubric

Subtitle: Software to search, sort, store and summarize science

The Perfect Bet

You want to learn something new. How do you do it? With easy access to the Internet, your first thought might be to let Google do the searching. Searching for terms about a subject often turns up more commonly used terms, and these new phrases may lead to areas you hadn’t thought about initially. You can bookmark your searches, organize them into folders, and add tags to identify them using keywords in your bookmark manager.

Your search can be more organized than just collecting web page links into bookmarks. With the many new software tools available, you can search a topic efficiently, add tags and annotations to scientific papers, and automatically summarize the key concepts. To illustrate these ideas, I’m going to demonstrate the process by finding literature on predicting the outcomes of soccer matches.

The goodreads summary of Adam Kucharski’s The Perfect Bet: How Science and Math Are Taking the Luck Out of Gambling says,

For the past 500 years, gamblers - led by mathematicians and scientists - have been trying to figure out how to pull the rug out from under Lady Luck. In The Perfect Bet, mathematician and award-winning writer Adam Kucharski tells the astonishing story of how the experts have succeeded, revolutionizing mathematics and science in the process. The house can seem unbeatable. Kucharski shows us just why it isn’t. Even better, he demonstrates how the search for the perfect bet has been crucial for the scientific pursuit of a better world.

The book shows how to beat casino games, bet on sports, and gives optimal stock trading techniques. We thought that soccer might be a fun way to begin learning about predicting the outcome and betting on the results. But as Jan pointed out, soccer games are typically low-scoring. Kucharski’s solution is in-game betting. Which team will take the next shot at the goal? Which team will control the ball the longest over the next five minutes?

Kucharski quoted an article from The Post Game, Betting After The Games Are Underway about Cantor Fitzgerald, an entrepreneur who started CG Technology to handle these bets. A 2016 ESPN article says that CG Technology was fined $22.5M for illegal gambling and money laundering. It appears that William Hill Race & Sportsbook has acquired parts of CG Technology, but it’s not clear if in-game betting on soccer is still available. So, maybe we won’t be able to bet on games, but even if we can, we still need to figure out how to do it. We need the prediction part first.

The Process

Aaron Tay’s Musings about librarianship is a marvelous blog about conducting online research. Aaron is a Library Analytics Manager at Singapore Management University and is the founder of Initiative for Open Abstracts to promote unrestricted access to scholarly research. One of his blog posts, Navigating the literature review describes the process of literature review, and contains some useful online tools. At the end of this post, I’ll provide a more complete set of tools, but since Aaron has already picked out the ones he thinks are likely to be the most useful, we’ll go through his process.

Normally, I’d recommend finding an expert or two as a first step, but “soccer gambling” is such an esoteric concept it’s unlikely that you’d find a real expert in both soccer and gambling. We’ll resort to doing online searches for this project. Aaron also has a post, Top new tools for researchers worth looking at with many good searching methods.

Simply reading about a subject doesn’t make you an expert, though. You have to experiment with and practice the new concept. “How do you get to Carnegie Hall?”, “Practice, man, practice.”

Briefly, an outline of the method is

  1. Make a folder for storing research articles, and set up a reference manager tool.
  2. Look for papers containing key words to get an idea about commonly used terms in the field.
  3. Find papers providing an overview of the subject, with references to more detailed articles.
  4. Map the literature space by linking papers by similar topics, or through citations.
  5. Read and summarize papers you’ve found, taking notes of important topics or methods mentioned.

Set up a Reference Manager

Before beginning the search, we need to be able to save journal articles. Create a PDF folder somewhere on your hard drive. You can add subfolders if it helps keep some sort of organization to the papers. Most good reference managers will be able to handle a jumble of papers thrown into a single folder.

Aaron recommends Zotero, but alternatives I like are Qiqqa and Docear. The site AlternativeTo recommends Mendeley as another good reference manager with features similar to Zotero. Of course, Qiqqa claims it’s much better than Zotero, and since I’ve been using Qiqqa for a while I’ll explain how it works. Qiqqa recently became open source, so the latest version is available on Github, a quick tutorial is available on YouTube, and the Qiqqa manual is also online.

After you have saved articles to your PDF folder, start Qiqqa, click on “Guest” and at the top left corner you should see “Add PDFs or References”. Click the drop-down arrow and then “Add Folder”. Navigate to your PDF folder and select it, allowing subfolders. Qiqqa will then read each of the PDFs in your PDF folder using OCR, automatically adding tags and creating bibliographies for each paper. It may take a few minutes depending on how many papers you’ve downloaded, but when Qiqqa is finished you can select papers by tag, author, publisher, year, or custom search criteria. Double-clicking on the paper brings it up in a new tab.

When downloading a new PDF, you’ll often find that the file name is something like PhysRevX.7.041052.pdf when the title of the paper is Quantum-Assisted Learning of Hardware-Embedded Probabilistic Graphical Models. You don’t need to change the file name title to match the paper title because Qiqqa will read the title and store it using the more readable form.

Top Level or Keyword Searching

We want to figure out how to gamble on soccer. Or should we be betting on football? The Northeastern University Library recommends combining key terms and truncating search words in their list of the top ten search tips. Since we’re not sure that in-game gambling is still possible, we might be better off looking for a method or strategy to predict the outcome using something like “soccer AND prediction”. Synonyms for “prediction” found in a thesaurus are “forecast”, “guess”, “indicator”, “prognosis” and “indicator” so we could try combinations of those as well.

A Google search for “soccer prediction” returns lots of predictions other people are making about soon-to-be played games such as these from FiveThirtyEight. FiveThirtyEight gives a detailed description of how they produce their predictions which mostly rely on ESPN’s Soccer Power Index (SPI). James Curley provided the data and R code used by FiveThirtyEight on GitHub.

Arthur Caldas wrote an article on Medium, Beating soccer odds using Machine Learning — Project Walkthrough that shows how to scrape data from the web, clean the downloaded data, and generate good features to use in the prediction.

A search on Google Scholar shows additional terms like “machine learning”, “neural network”, “model”, “bayesian”, and “results”. This search returns papers such as

Other search engines such as Microsoft Academic, BASE, and Science Open provide alternate searching methods. BASE returned these links to very relevant papers,

We should also take advantage of the papers referenced in The Perfect Bet:

The Perfect Bet also mentions the Journal of Quantitative Analysis in Sports and the MIT Sloan Sports Analytics Conference, which may provide useful insights.

Review Papers

Review or “meta” papers are written by experts on a particular subject, and describe the current state of science on that topic. Review papers typically describe the work by many scientists with references to papers they’ve written. Review papers can be found with 2Dsearch, using methods discussed by Aaron Tay in a Medium article.

2Dsearch

When you go to the 2Dsearch website, you’ll see a blank area on the left and a results window on the right. Enter terms anywhere in the search term window. I chose “soccer”, “review”, and “prediction”. Right-click on any word to bring up a list of suggested terms, and choose the most relevant ones. You can group similar terms by dragging a box around them, so in the Forecasting box I have “prediction”, “forecast”, “estimation”, and “projections”.

In the upper right corner of each box you can set the Boolean operator to “AND” or “OR” (upper case). Group all the search terms into a single box, labeled here as “Meta-analysis of soccer games prediction”, and choose the Boolean “AND”.

In the results window, choose the search space such as Lens.org. Other possibilities are Google, Google Scholar, PubMed, IEEE Xplore, and others. Double click on any paper to bring it up in a new tab. The three lines with a red dot in the upper left corner opens the global menu. A very useful starting point is the “How to use” button which will present introductory videos.

Using the search terms shown as well as “NOT injury” led to several useful papers,

Another useful search tool is Semantic Scholar. After entering the search terms, use the drop-down menu under “Publication Type” and select “Review” and “Meta Analysis”. This returned,

Citation Mapping

Aaron’s post More research/literature mapping tools - Connected Papers and CoCites reviews several citation mapping tools, and I liked Connected Papers best because it’s browser-based and very fast. Type the name of a paper in the search bar and Connected Papers will display the abstract. Click on the abstract to get a graph of related papers,

connected-papers

Hovering over one of the circles displays the title, authors, and abstract of that paper, and lets you open it in a new tab. The closer a circle is to the original paper, the more similar it is in content. Darker colors represent newer papers. This is a similarity graph, not a citation tree, but Citation Gecko is a way to see how papers are linked through citations. When you start Citation Gecko, it asks for a seed paper

citation-gecko-seed-papers.png

and after entering Dolores: a model that predicts football match outcomes from all over the world other recommended seed papers are listed in a new pop-up window. After selecting the most relevant papers, click “Add selected seed papers” at the bottom of the window, which opens a new view.

citation-gecko-recommended-papers.png

This lets you see links between your seed papers and others. You can add some of these as new seed papers, or you can follow the link to the paper. In many cases, you can download the paper by following the link.

Another useful search tool is the Local Citation Network that generates a list of citations based on a DOI (Digital Object Identifier) or list of DOI’s and provides a graphical representation of the linked papers. Tim Wölfle, the author of Local Citation Network explains the differences between it and Citation Gecko on the Leiden Madtrics site.

Ujjal Marjit wrote a nice blog post describing the tool, Free Visualization Tool to Support Literature Survey.

local-citation-network.png

Summarizing and Annotating

Finding and downloading a lot of papers is pretty pointless by itself. The readings will continue until morale improves. You might have collected weeks worth of reading material, so we’ll need a quick way to extract important points from each paper.

Unlike reading a novel, you shouldn’t read a paper from start to finish. Read the abstract and key terms, then look for the important topics in each section. Look up any terms you don’t understand, and summarize the main points. Elsevier has an infographic outlining the process with links to papers that go into more depth.

Scholarcy developed an AI paper summarizer extension for Chrome and Edge that reads the paper currently open in your browser and generates tag words of key concepts, writes a summary, and gives an overview of the paper’s methods, results, discussion, conclusions, and future work sections.

The summary is in Markdown format that can be read into editors like Zettlr or Obsidian. Key concepts highlighted by Scholarcy have links to Wikipedia articles to quickly understand unfamiliar terms. Click on a reference in the Markdown summary and a link will open in Google Scholar, scite_, or it will open the paper. Scholarcy mangles equations, but it does a pretty good job of giving you the gist of the article.

scholarcy.png

An alternate to Scholarcy is paper-digest described by Ujjal Marjit in How to Generate an Automatic Summary of Research Paper. The summary is sparser than Scholarcy’s but may be useful for a quick understanding of the basic outline.

paper-digest.png

As mentioned earlier, Qiqqa (open source) manages your documents, but also has many tools for searching and annotating your documents. After starting Qiqqa, the home screen opens. Click on the drop-down icon “Add PDFs or References” and select “Add Folder” to import your library into Qiqqa. Qiqqa uses OCR to read the documents. If you have added new documents since you first started Qiqqa, it reads those as well. When it has finished, you will have a “Guest” library, which looks something like this:

qiqqa-guest.png

In the left column are common tags and the number of papers with those tags. You can filter by Qiqqa Autotags (tags Qiqqa assigned), author, publication, year, theme (groups of common tags), or publication type such as article, book, or proceedings.

A search box in the upper right corner lets you find articles containing specific words or phrases. Here, I’ve searched for “pi-rating” and Qiqqa found seven papers with that term, sorting them by relevancy. Click on the search score block (yellow to red with percentages to the left of the title) to see page numbers where the term occurs in the paper.

qiqqa-search-score.png

You can highlight a reference, right-click and search the web for the paper using the built-in Qiqqa browser.
qiqqa-browser.png

As you read the paper, you can highlight important sections,

qiqqa-highlighting.png

and add annotations and tags.

qiqqa-annotation.png

Annotations become searchable so you don’t need to remember where they are in the paper. Qiqqa has many more features described in the online manual.

Hopefully, you’ll be able to use some of these tools to improve your research skills. They won’t make you an expert, but at least you’ll be better informed.


A More Complete List of Tools

Below is a more complete list of software tools available for research. You may find some to be better than the ones described above for your work.

Navigating the literature review: Aaron Tay summarizes literature review tools.

Top new tools for researchers worth looking at: Search tools, statistics software, data cleaning and machine learning.

Top Level Searches

Top Ten Search Tips

Literature Mapping

List of Innovative Literature mapping tools

Secondary (Deeper) Searches

Retreival

Seven ways to download papers

Summarizers

Academic related browser extensions

Citations

Organizers

Data Sources

Paywalls

Improving access and delivery of academic content

7 Ways How to Download Research Papers for Free