A Research Rubric
Subtitle: Software to search, sort, store and summarize science
# The Perfect Bet
You want to learn something new. How do you do it? With easy access to the Internet, your first thought might be to let Google do the searching. Searching for terms about a subject often turns up more commonly used terms, and these new phrases may lead to areas you hadn’t thought about initially. You can bookmark your searches, organize them into folders, and add tags to identify them using keywords in your bookmark manager.
Your search can be more organized than just collecting web page links into bookmarks. With the many new software tools available, you can search a topic efficiently, add tags and annotations to scientific papers, and automatically summarize the key concepts. To illustrate these ideas, I’m going to demonstrate the process by finding literature on predicting the outcomes of soccer matches.
The goodreads summary of Adam Kucharski’s The Perfect Bet: How Science and Math Are Taking the Luck Out of Gambling says,
For the past 500 years, gamblers - led by mathematicians and scientists - have been trying to figure out how to pull the rug out from under Lady Luck. In The Perfect Bet, mathematician and award-winning writer Adam Kucharski tells the astonishing story of how the experts have succeeded, revolutionizing mathematics and science in the process. The house can seem unbeatable. Kucharski shows us just why it isn’t. Even better, he demonstrates how the search for the perfect bet has been crucial for the scientific pursuit of a better world.
The book shows how to beat casino games, bet on sports, and gives optimal stock trading techniques. We thought that soccer might be a fun way to begin learning about predicting the outcome and betting on the results. But as Jan pointed out, soccer games are typically low-scoring. Kucharski’s solution is in-game betting. Which team will take the next shot at the goal? Which team will control the ball the longest over the next five minutes?
Kucharski quoted an article from The Post Game, Betting After The Games Are Underway about Cantor Fitzgerald, an entrepreneur who started CG Technology to handle these bets. A 2016 ESPN article says that CG Technology was fined $22.5M for illegal gambling and money laundering. It appears that William Hill Race & Sportsbook has acquired parts of CG Technology, but it’s not clear if in-game betting on soccer is still available. So, maybe we won’t be able to bet on games, but even if we can, we still need to figure out how to do it. We need the prediction part first.
# The Process
Aaron Tay’s Musings about librarianship is a marvelous blog about conducting online research. Aaron is a Library Analytics Manager at Singapore Management University and is the founder of Initiative for Open Abstracts to promote unrestricted access to scholarly research. One of his blog posts, Navigating the literature review describes the process of literature review, and contains some useful online tools. At the end of this post, I’ll provide a more complete set of tools, but since Aaron has already picked out the ones he thinks are likely to be the most useful, we’ll go through his process.
Normally, I’d recommend finding an expert or two as a first step, but “soccer gambling” is such an esoteric concept it’s unlikely that you’d find a real expert in both soccer and gambling. We’ll resort to doing online searches for this project. Aaron also has a post, Top new tools for researchers worth looking at with many good searching methods.
Simply reading about a subject doesn’t make you an expert, though. You have to experiment with and practice the new concept. “How do you get to Carnegie Hall?”, “Practice, man, practice.”
Briefly, an outline of the method is
- Make a folder for storing research articles, and set up a reference manager tool.
- Look for papers containing key words to get an idea about commonly used terms in the field.
- Find papers providing an overview of the subject, with references to more detailed articles.
- Map the literature space by linking papers by similar topics, or through citations.
- Read and summarize papers you’ve found, taking notes of important topics or methods mentioned.
# Set up a Reference Manager
Before beginning the search, we need to be able to save journal articles. Create a PDF folder somewhere on your hard drive. You can add subfolders if it helps keep some sort of organization to the papers. Most good reference managers will be able to handle a jumble of papers thrown into a single folder.
Aaron recommends Zotero, but alternatives I like are Qiqqa and Docear. The site AlternativeTo recommends Mendeley as another good reference manager with features similar to Zotero. Of course, Qiqqa claims it’s much better than Zotero, and since I’ve been using Qiqqa for a while I’ll explain how it works. Qiqqa recently became open source, so the latest version is available on Github, a quick tutorial is available on YouTube, and the Qiqqa manual is also online.
After you have saved articles to your PDF folder, start Qiqqa, click on “Guest” and at the top left corner you should see “Add PDFs or References”. Click the drop-down arrow and then “Add Folder”. Navigate to your PDF folder and select it, allowing subfolders. Qiqqa will then read each of the PDFs in your PDF folder using OCR, automatically adding tags and creating bibliographies for each paper. It may take a few minutes depending on how many papers you’ve downloaded, but when Qiqqa is finished you can select papers by tag, author, publisher, year, or custom search criteria. Double-clicking on the paper brings it up in a new tab.
When downloading a new PDF, you’ll often find that the file name is something like PhysRevX.7.041052.pdf when the title of the paper is Quantum-Assisted Learning of Hardware-Embedded Probabilistic Graphical Models. You don’t need to change the file name title to match the paper title because Qiqqa will read the title and store it using the more readable form.
# Top Level or Keyword Searching
We want to figure out how to gamble on soccer. Or should we be betting on football? The Northeastern University Library recommends combining key terms and truncating search words in their list of the top ten search tips. Since we’re not sure that in-game gambling is still possible, we might be better off looking for a method or strategy to predict the outcome using something like “soccer AND prediction”. Synonyms for “prediction” found in a thesaurus are “forecast”, “guess”, “indicator”, “prognosis” and “indicator” so we could try combinations of those as well.
A Google search for “soccer prediction” returns lots of predictions other people are making about soon-to-be played games such as these from FiveThirtyEight. FiveThirtyEight gives a detailed description of how they produce their predictions which mostly rely on ESPN’s Soccer Power Index (SPI). James Curley provided the data and R code used by FiveThirtyEight on GitHub.
Arthur Caldas wrote an article on Medium, Beating soccer odds using Machine Learning — Project Walkthrough that shows how to scrape data from the web, clean the downloaded data, and generate good features to use in the prediction.
A search on Google Scholar shows additional terms like “machine learning”, “neural network”, “model”, “bayesian”, and “results”. This search returns papers such as
- Incorporating domain knowledge in machine learning for soccer outcome prediction
- Neural underpinnings of superior action prediction abilities in soccer players
- Optimizing the Prediction Process: From Statistical Concepts to the Case Study of Soccer
Other search engines such as Microsoft Academic, BASE, and Science Open provide alternate searching methods. BASE returned these links to very relevant papers,
- The Open International Soccer Database for machine learning.
- The 2017 Soccer Prediction Challenge
- pi-football: A Bayesian network model for forecasting Association Football match outcomes
We should also take advantage of the papers referenced in The Perfect Bet:
- The birth process model for association football matches
- A mixed-effects model for identifying goal-scoring ability of footballers
- Forecasting sports tournaments by ratings of (prob)abilities: A comparison for the EURO 2008
- How computer analysts took over at Britain’s top football clubs
- How Does the Past of a Soccer Match Influence Its Future? Concepts and Statistical Analysis
- How efficient is the European football betting market? Evidence from arbitrage and trading strategies
- How the spreadsheet-wielding geeks are taking over football
- Joint modelling of goals and bookings in association football
- Just how unpredictable is the Premier League? Scientists have done the maths
- Modelling association football scores and inefficiencies in the football betting market
- Professionals Play Minimax
- why spain will win…
- Why the power of one is overhyped in football
- World Cup Stats Prof: I was right all along
The Perfect Bet also mentions the Journal of Quantitative Analysis in Sports and the MIT Sloan Sports Analytics Conference, which may provide useful insights.
# Review Papers
Review or “meta” papers are written by experts on a particular subject, and describe the current state of science on that topic. Review papers typically describe the work by many scientists with references to papers they’ve written. Review papers can be found with 2Dsearch, using methods discussed by Aaron Tay in a Medium article.
When you go to the 2Dsearch website, you’ll see a blank area on the left and a results window on the right. Enter terms anywhere in the search term window. I chose “soccer”, “review”, and “prediction”. Right-click on any word to bring up a list of suggested terms, and choose the most relevant ones. You can group similar terms by dragging a box around them, so in the Forecasting box I have “prediction”, “forecast”, “estimation”, and “projections”.
In the upper right corner of each box you can set the Boolean operator to “AND” or “OR” (upper case). Group all the search terms into a single box, labeled here as “Meta-analysis of soccer games prediction”, and choose the Boolean “AND”.
In the results window, choose the search space such as Lens.org. Other possibilities are Google, Google Scholar, PubMed, IEEE Xplore, and others. Double click on any paper to bring it up in a new tab. The three lines with a red dot in the upper left corner opens the global menu. A very useful starting point is the “How to use” button which will present introductory videos.
Using the search terms shown as well as “NOT injury” led to several useful papers,
- On Modelling Soccer Data
- Modeling outcomes of soccer matches
- The ARCANE Project: How an Ecological Dynamics Framework Can Enhance Performance Assessment and Prediction in Football
- Predicting Sports Results with Artificial Intelligence – A Proposal Framework for Soccer Game
Another useful search tool is Semantic Scholar. After entering the search terms, use the drop-down menu under “Publication Type” and select “Review” and “Meta Analysis”. This returned,
- Forty years of score-based soccer match outcome prediction: an experimental review
- Dolores: a model that predicts football match outcomes from all over the world
- Score-based soccer match outcome modeling – an experimental review
- A Review on Football Match Outcome Prediction using Bayesian Networks
# Citation Mapping
Aaron’s post More research/literature mapping tools - Connected Papers and CoCites reviews several citation mapping tools, and I liked Connected Papers best because it’s browser-based and very fast. Type the name of a paper in the search bar and Connected Papers will display the abstract. Click on the abstract to get a graph of related papers,
Hovering over one of the circles displays the title, authors, and abstract of that paper, and lets you open it in a new tab. The closer a circle is to the original paper, the more similar it is in content. Darker colors represent newer papers. This is a similarity graph, not a citation tree, but Citation Gecko is a way to see how papers are linked through citations. When you start Citation Gecko, it asks for a seed paper
and after entering Dolores: a model that predicts football match outcomes from all over the world other recommended seed papers are listed in a new pop-up window. After selecting the most relevant papers, click “Add selected seed papers” at the bottom of the window, which opens a new view.
This lets you see links between your seed papers and others. You can add some of these as new seed papers, or you can follow the link to the paper. In many cases, you can download the paper by following the link.
Another useful search tool is the Local Citation Network that generates a list of citations based on a DOI (Digital Object Identifier) or list of DOI’s and provides a graphical representation of the linked papers. Tim Wölfle, the author of Local Citation Network explains the differences between it and Citation Gecko on the Leiden Madtrics site.
Ujjal Marjit wrote a nice blog post describing the tool, Free Visualization Tool to Support Literature Survey.
# Summarizing and Annotating
Finding and downloading a lot of papers is pretty pointless by itself. The readings will continue until morale improves. You might have collected weeks worth of reading material, so we’ll need a quick way to extract important points from each paper.
Unlike reading a novel, you shouldn’t read a paper from start to finish. Read the abstract and key terms, then look for the important topics in each section. Look up any terms you don’t understand, and summarize the main points. Elsevier has an infographic outlining the process with links to papers that go into more depth.
Scholarcy developed an AI paper summarizer extension for Chrome and Edge that reads the paper currently open in your browser and generates tag words of key concepts, writes a summary, and gives an overview of the paper’s methods, results, discussion, conclusions, and future work sections.
The summary is in Markdown format that can be read into editors like Zettlr or Obsidian. Key concepts highlighted by Scholarcy have links to Wikipedia articles to quickly understand unfamiliar terms. Click on a reference in the Markdown summary and a link will open in Google Scholar, scite_, or it will open the paper. Scholarcy mangles equations, but it does a pretty good job of giving you the gist of the article.
An alternate to Scholarcy is paper-digest described by Ujjal Marjit in How to Generate an Automatic Summary of Research Paper. The summary is sparser than Scholarcy’s but may be useful for a quick understanding of the basic outline.
As mentioned earlier, Qiqqa (open source) manages your documents, but also has many tools for searching and annotating your documents. After starting Qiqqa, the home screen opens. Click on the drop-down icon “Add PDFs or References” and select “Add Folder” to import your library into Qiqqa. Qiqqa uses OCR to read the documents. If you have added new documents since you first started Qiqqa, it reads those as well. When it has finished, you will have a “Guest” library, which looks something like this:
In the left column are common tags and the number of papers with those tags. You can filter by Qiqqa Autotags (tags Qiqqa assigned), author, publication, year, theme (groups of common tags), or publication type such as article, book, or proceedings.
A search box in the upper right corner lets you find articles containing specific words or phrases. Here, I’ve searched for “pi-rating” and Qiqqa found seven papers with that term, sorting them by relevancy. Click on the search score block (yellow to red with percentages to the left of the title) to see page numbers where the term occurs in the paper.
You can highlight a reference, right-click and search the web for the paper using the built-in Qiqqa browser.
As you read the paper, you can highlight important sections,
and add annotations and tags.
Annotations become searchable so you don’t need to remember where they are in the paper. Qiqqa has many more features described in the online manual.
Hopefully, you’ll be able to use some of these tools to improve your research skills. They won’t make you an expert, but at least you’ll be better informed.
# A More Complete List of Tools
Below is a more complete list of software tools available for research. You may find some to be better than the ones described above for your work.
Navigating the literature review: Aaron Tay summarizes literature review tools.
Top new tools for researchers worth looking at: Search tools, statistics software, data cleaning and machine learning.
# Top Level Searches
Top Ten Search Tips
- 2Dsearch Instead of entering Boolean strings into one-dimensional search boxes, queries are formulated by manipulating objects on a two-dimensional canvas.
- Academia.edu Download 28 million PDFs for free
- arXiv a free distribution service and an open-access archive for 1,978,106 scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics.
- BASE is one of the world’s most voluminous search engines especially for academic web resources.
- Digital Library of the Commons is a gateway to the international literature on the commons.
- DOAJ Independent database contains over 16,500 peer-reviewed open access journals covering all areas of science, technology, medicine, social sciences, arts and humanities.
- EndNote Click Save time accessing full-text PDFs with the free EndNote Click browser plugin.
- ERIC allows you to search by topic for material related to the field of education.
- Google search Free search engine provided by Google
- Google scholar indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines.
- OpenDOAR is the quality-assured, global Directory of Open Access Repositories.
- papergraph is an online visual tool to understand the latest litterature in a given research community.
- PLOS a nonprofit, Open Access publisher empowering researchers to accelerate progress in science and medicine by leading a transformation in research communication.
- Read by QXMD lets you create a personalised feed that is updated daily with new papers on research topics or from journals of your choice.
- refseek locates relevant academic search results from web pages, books, encyclopedias, and journals.
- ScienceOpen provides researchers with a wide range of tools to support their research – all for free.
- scinapse Find papers from over 170m papers in major STEM journals.
# Literature Mapping
List of Innovative Literature mapping tools
- Bibnet Google Scholar Scraper records top 10 papers or book from GS search, Using Google Scholar’s ‘search within citations’ it checks to see if any of the authors recorded to the database have cited any of the publications.
- Citation Gecko finds relevant papers from seed papers
- CiteSpace is an application for visualizing and analyzing trends and patterns in scientific literature.
- Connected Papers is a simple one shot visualization tool using one seed paper.
- Inciteful uses multiple seed papers in an interative process.
- Litmaps lets you visualize research navigation, citation network search, and team synchronization.
- Local Citation Network helps scientists with their literature review using metadata from Microsoft Academic, Crossref and OpenCitations
- Open Knowledge Maps is the world’s largest visual search engine for scientific knowledge.
- PaperGraph generates graph data from the Semantic Scholar Open Research Corpus using the binaries in dennybritz/papergraph .
- ResearchRabbit is a citation network graph (network and timeline), co-authorship graph (requires institutional email).
- VOSviewer is a software tool for constructing and visualizing bibliometric networks
# Secondary (Deeper) Searches
- CoCites uses keywords to find relevant articles
- CrossRef makes research outputs easy to find, cite, link, assess, and reuse.
- dblp provides open bibliographic information on major computer science journals and proceedings.
- Dimensions is the world’s largest linked research information dataset.
- EThOS searches over 500,000 doctoral theses.
- iris.ai is a world-leading AI engine for scientific text understanding.
- JURN searches millions of free academic articles, chapters and theses.
- OpenAlex open, comprehensive catalog of scholarly papers, authors, institutions a drop-in replacement for Microsoft Academic Graph (Dec 2021).
- OpenCitations is an independent not-for-profit infrastructure organization for open scholarship dedicated to the publication of open bibliographic and citation data by the use of Semantic Web technologies
- OurResearch is a collection of free, open-source tools are used by millions every day, in universities, businessess, and libraries worldwide, to uncover, connect, and analyze research products.
- Paperity gives readers easy and unconstrained access to thousands of journals from hundreds of disciplines, in one central location.
- PubMed comprises more than 33 million citations for biomedical literature from MEDLINE, life science journals, and online books.
- Semantic Scholar is a free, AI-powered research tool for scientific literature.
- The Lens is an online patent and scholarly literature search facility.
- wizdom.ai Gain powerful insights about the past, present and future with the most comprehensive knowledge graph covering the entire universe of research.
- Zenodo is a general-purpose open-access repository developed under the European OpenAIRE program and operated by CERN It allows researchers to deposit research papers, data sets, research software, reports, and any other research related digital artefacts.
- 101 Free Online Journal and Research Databases for Academics
Seven ways to download papers
- CORE is an aggregator of open access research published in research repositories and journals worldwide.
- Directory of Open Access Journals is a community-curated website that lists high quality, peer-reviewed open access journals.
- FreeFullPDF The aim of FreeFullPDF.com is to increase the visibility and ease of use of open access scientific journals, theses, posters and patents.
- Library Genesis is a massive database of over 2.7 million books and 58 million science magazine files.
- Open Access Button provides public repositories of research papers to make publicly funded research accessible to all.
- ScienceOpen is a professional networking platform for scholars that offers access to over 40 million research papers in all areas of science.
Academic related browser extensions
- Crammer Provides text analytics to the webpage using artificial intelligence to quickly find what you are looking for and save time
- esummarizer Automatically summarize any text in a few seconds.
- IntelliPPT Summarize articles, text, websites, essays and documents for free.
- Open Text Summarizer OTS will create a short summary or will highlight the main ideas in the text.
- paper digest Artificial Intelligence summarizes academic articles for you.
- QuillBot is a web-based summarizing tool that lets you take any textual content and derive the most important parts of the information.
- Resoomer filters through your content by essential factors, key topics, and ideas for faster interpretation of the text.
- scholarcy is the online article summarizer tool, reads your research articles, reports and book chapters in seconds and breaks them down into bite-sized sections – so you can quickly assess how important any document is to your work.
- SpinBot is a text rewriter, article spinner, and content creating tool.
- summarizebot summarizes any information and sharable documents, images, audio files
- Summary Generator is an online text summarizer based on open source text summarization software.
- synopsis AI-powered content extraction and summarization for webpages and articles.
- CiteAs is a convenient tool to obtain the correct citation for any publication, preprint, software or dataset in one click.
- citationchaser An input article list can be used to return a list of all referenced records, and/or all citing records in the Lens.org database
- scite displays the context of the citation and describes whether the article provides supporting or contrasting evidence.
- BibSonomy helps you to manage your publications and bookmarks, to collaborate with your colleagues and to find new interesting material for your research.
- Docear helps you organizing, creating, and discovering academic literature.
- JabRef Easily retrieve and link full-text articles.
- Mendeley manages and shares academic knowledge.
- Obsidian is a knowledge base on top of a local folder of plain text Markdown files.
- OpenPaper.work sorts all your papers by turning them into searchable documents.
- Polar is an integrated reading environment to build your knowledge base.
- Qiqqa combines PDF reference management tools, a citation manager, and a mind map brainstorming tool.
- Zotero is a free, easy-to-use tool to help you collect, organize, cite, and share research.
# Data Sources
- Aminer is a free online service used to index, search, and mine big scientific data.
- CIA World Factbook provides basic intelligence on the history, people, government, economy, energy, geography, environment, communications, transportation, military, terrorism, and transnational issues for 266 world entities.
- Common Crawl is an open repository of web crawl data that can be accessed and analyzed by anyone.
- Our World in Data provides research and data to make progress against the world’s largest problems.
- Seshat gathers data into a single, large database that can be used to test scientific hypotheses.
Improving access and delivery of academic content
7 Ways How to Download Research Papers for Free
- Bypass Paywalls is a web browser extension to help bypass paywalls for selected sites.
- Library Genesis is a database of over 2.7 million books and 58 million science magazine files.
- Open Access Button links to free, legal research articles delivered instantly or automatically requested from authors.
- Sci-Hub provides access to academic papers and articles using educational institution access and its own cache of downloaded papers and articles. Illegal in some countries, use a VPN, TOR, or Whonix to conceal your location if you still want to use Sci-Hub.
- Unpaywall is a massive open database of more than 21 million free scholarly articles.