Skip to main content
Domo Knowledge Base

Fun Sample Datasets

Version 20


It's hard learning how to perform advanced data analysis and write meaningful Beast Mode calculations when all you've got to work with is boring old marketing campaign spreadsheets. Believe us, we know. To that end, we present a list of far more interesting datasets you might find useful as you learn how to build cards and analyze data in Domo. 

Not satisfied with the datasets listed here? You can find thousands more on Kaggle, a website in which users upload their own datasets for competition. 

200,000+ Jeopardy Questions

This dataset contains all questions and answers from the game show "Jeopardy" from its inception to 2012. It is available in XLSX, CSV, and JSON formats.

This dataset was was compiled by Reddit user trexmatt in 2014. To view the Reddit page on which this dataset was originally posted, click here:

Columns in this dataset are as follows:




The question category, e.g. "HISTORY"


The value of the question as a string, e.g. "$200" 

Note: This is "NONE" for Final Jeopardy and Tiebreaker questions.  


The text of the question, e.g. "Calf-length pants styled in colorful island prints are named for this type of Hawaiian party"

Note: This sometimes contains hyperlinks and messy text when there is a picture or video question.  


The text of the answer, e.g. "luau pants" 


The segment of Jeopardy in which the question occurred, either "Jeopardy," "Double Jeopardy," "Final Jeopardy," or "Tiebreaker" (extremely rare)


String representing the sequential order number of the show, e.g. "4680"


The show's air date in the format YYYY-MM-DD

Wine Reviews

This dataset contains wine review data scraped from on June 15th, 2017. It is available in CSV and JSON formats. 

This dataset was compiled by Kaggle user zackthoutt. For more information about the dataset and ideas for how to use it, visit

Columns in this dataset are as follows:




The country the wine is from


A few sentences from a sommelier describing the wine's taste, smell, look, feel, etc.


The vineyard within the winery where the grapes that made the wine are from


The number of points WineEnthusiast rated the wine on a scale of 1-100


The cost of a bottle of this wine (in dollars) 


The province or state of the country the wine is from


The wine-growing area of the province or state (e.g. "Napa")


A more specific region within a larger region (may be blank if there is no such smaller region)

taster name

Name of the sommelier who tasted and reviewed the wine


Twitter handle of the sommelier who tasted and reviewed the wine


The title of the wine review (which often contains the vintage, if you are interested in that information)


The type of grapes used to make the wine (e.g. "Pinot Noir")

120 Years of Olympic History

This dataset provides athlete and event data for all Olympics held from 1896 to 2016. It is available in CSV and XLSX format. 

This dataset was compiled by Kaggle user Randi H. Griffin. For more information about the dataset and ideas for how to use it, visit

When using this dataset, note that the Winter and Summer Games were held in the same year up until 1992. After that, they began to be staggered, such that the Summer and Winter games alternate on a four-year cycle. A common mistake people make when analyzing this data is to assume that the Summer and Winter Games have always been staggered.

Columns in this dataset are as follows:




An ID number assigned to this athlete based on their sequential order in the dataset 


The name of the athlete


The gender of the athlete


The age of the athlete


The height of the athlete, in centimeters


The weight of the athlete, in kilograms


The country this athlete represents


The three-letter abbreviation for the country represented by the athlete


The year and season for this Olympic event


The year of the event


The season of the event (either Summer or Winter)


The city in which these Olympics was held


The sport of the event


The name of the event


The medal won by the athlete ("NA" if no medal was won)

Superhero Characteristics and Powers

These datasets include basic information for over 700 superheroes (and villains). The first dataset, heroes_information.csv, provides demographic characteristics such as gender, race, comic publisher, etc., while the second dataset, super_hero_powers.csv, maps out the powers for each superhero by assigning Boolean (true/false) values for 168 different superpowers. 

These datasets were compiled by Kaggle user ClaudioDavi. For more information, see

Columns in the heroes_information dataset are as follows:




The name or alias of the superhero


The gender of the superhero


The superhero's race (such as Human, Amazon, Vampire, etc.)

Eye Color

The superhero's eye color

Hair Color

The superhero's hair color

Skin Color

The superhero's skin color


The superhero's height (in centimeters)

Note: Many of the listed superheroes are given a height and weight of -99. I am not exactly sure what this means, but I suspect it indicates that this information is unknown.   


The superhero's weight (in kilograms)

Note: Many of the listed superheroes are given a height and weight of -99. I am not exactly sure what this means, but I suspect it indicates that this information is unknown.   


The comics company that created this superhero (such as Marvel, D.C., etc.)


The superhero's overall alignment (good, bad, or neutral)

Because the super_hero_powers dataset includes 168 columns, we do not list them here. However, this dataset is simple to understand. For each superhero, a value of true or false is assigned for each power. For example, the superhero "Banshee" is assigned a value of "TRUE" for "Flight," "Audio Control," "Force Field," "Enhanced Hearing," "Sonar," and "Sonic Scream," and "FALSE" for all other powers.

UFO Sightings

These datasets include information on all reported UFO sightings from 1906 to 2014, with time standardization and geocoding. Two datasets in CSV format are linked here. The first of these, UFO_sightings_complete.csv, includes entries where the location of the sighting was not found or blank (0.8146%) or have an erroneous or blank time (8.0237%). In the second, UFO_sightings_scrubbed.csv, these erroneous and blank entries have been removed.

This data comes from the National UFO Reporting Center (NUFORC). For more information, visit the Kaggle page at

Columns in both datasets are as follows:




The date and time of the sighting, in m/d/yyyy h:mm format


The city in which the UFO was sighted


The U.S. state in which the UFO was sighted (applies to U.S. sightings only; otherwise left blank)


The country in which the UFO was sighted, using a two-letter country abbreviation (such as "gb" for "Great Britain")


The shape of the UFO (such as circular, cigar, etc.)

duration (seconds)

The duration of the sighting in seconds

duration (hours/min)

The duration of the sighting in hours and minutes

date posted

The posted date of the sighting, in in m/d/yyyy format


The latitude coordinate of the sighting, in DDD.dddd format


The longitude of the sighting, in in DDD.dddd format

Mushroom Classification

This dataset includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota families, drawn from The Audubon Society Field Guide to North American Mushrooms (1981). Each specimen is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. (This latter class was combined with the poisonous one.) 

This information is recommended only for use in honing your skills in data structuring and analysis. Please DO NOT use this information as a botanical reference/survival guide for determining mushroom edibility. Wild mushrooms should only be identified by professionals.

This dataset is available in XLSX format only. 

Columns in this dataset are as follows:




The edibility of the sample (either edible or poisonous)


The shape of the mushroom's cap (convex, bell-shaped, flat, etc.)


The texture of the mushroom's surface (smooth, scaly, fibrous, etc.)


The color of the mushroom's cap


Indicates whether bruises are present on the mushroom (yes or no)


The smell of the mushroom (pungent, almond, etc.)


Indicates whether the gills of the mushroom are attached or free


Indicates whether the gills are closely spaced or crowded


The size of the gills (narrow or broad)


The color of the gills


Indicates whether the stalk is enlarged or tapers


The shape of the stalk root (equal, club, bulbous, etc.)


The texture of the stalk above the ring


The texture of the stalk below the ring


The color of the stalk above the ring


The color of the stalk below the ring


The veil type (all values are "Partial")


The color of the veil


The number of rings


The ring shape 


The color of the spores


Indicates the relative number of similar mushrooms in the area (abundant, scattered, solitary, etc.)


The type of habitat the mushroom was found in (grasses, urban, etc.)

U.S. Baby Names

These datasets list most names in usage in the U.S. between 1880 and the present, with the count of each name given per year. Only names given to at least 5 babies in the same year are included in the datasets. The first dataset, NationalNames.csv, provides counts of all names throughout the U.S. as a whole, while the second, StateNames.csv, breaks these counts down by individual state (therefore it is significantly larger). 

For more information about these datasets, see the Kaggle page at

Note: If you encounter an error while trying to upload one of these datasets to Domo using the File Upload connector, try saving the file as an Excel file. Be aware, however, that Excel files can only contain up to 1,048,576 rows, and both of these CSV files contain more than this number. Therefore you will not be able to upload the entire dataset when using Excel.  

Columns in these datasets are as follows:




An ID number assigned to this name in this year (for purposes of this dataset only)


The baby name


The year in which a count was given for this baby name 


The gender associated with this usage of the name (an important qualifier, since separate counts are given per name per gender)


The count of this name for a specific gender per year  

State (StateNames dataset only)

The state in which a count was made

World Happiness Report

The World Happiness Report is a landmark survey of the state of global happiness. The World Happiness 2017, which ranks 155 countries by their happiness levels, was released at the United Nations at an event celebrating International Day of Happiness. The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions.

The attached CSV datasets provide happiness data for most world nations for the years 2015-2017. For each country, an overall ranking is assigned, and individual scores are provided for criteria such as life expectancy, economy, freedom, and so on. 

For more information about the World Happiness Report and how to interpret the data, see the Kaggle page at

Columns in these datasets are as follows:




The name of the country

Region (2015 and 2016 only)

World region the country belongs to (such as Western Europe, Middle East and Northern Africa, etc.)

Happiness Rank

The country's ranking based on the Happiness Score

Happiness Score

A metric measured by asking sample people the question "How would you rate your happiness on a scale of 0 to 10 where 0 is the happiest?"

Standard Error (2015 only)

The standard of error of the Happiness Score

Lower Confidence Interval (2016 only)

The lower confidence interval of the Happiness Score

Upper Confidence Interval (2016 only)

The upper confidence interval of the Happiness Score

Whisker.high (2017 only)

The whisker high score for this country

Whisker.low (2017 only)

The whisker low score for this country

Economy (GDP per Capita)

The extent to which GDP contributes to the calculation of the Happiness Score


The extent to which family contributes to the calculation of the Happiness Score

Health (Life Expectancy)

The extent to which life expectancy contributes to the calculation of the Happiness Score


The extent to which freedom contributes to the calculation of the Happiness Score

Trust (Government Corruption)

The extent to which perception of government corruption contributes to calculation of the Happiness Score


The extent to which generosity contributes to the calculation of the Happiness Score

Dystopia Residual 

The extent to which dystopia residual contributes to the calculation of the Happiness Score (for more information about what constitutes dystopia residual, see the Kaggle page referenced above)

80 Cereals

This CSV dataset provides nutritional data on 80 different breakfast cereals.  

This dataset was uploaded to Kaggle by user Chris Crawford. Data was gathered and cleaned by Petra Isenberg, Pierre Dragicevic, and Yvonne Jansen. To see the Kaggle page, visit The original source is available here:   

Columns in this dataset are as follows:




The name of the cereal


The manufacturer of the cereal


The type of cereal, either cold or hot


The number of calories per serving


Amount of protein per serving, in grams


Amount of fat per serving, in grams


Amount of sodium per serving, in milligrams


Amount of dietary fiber per serving, in grams


Amount of complex carbohydrates per serving, in grams


Amount of sugar per serving, in grams


Amount of potassium per serving, in milligrams


The percentage of vitamins and minerals--either 0, 25, or 100, indicating the typical percentage of FDA recommended


The weight of one serving, in ounces


The number of cups in a serving


A rating of the cereal (author unsure of source, possibly Consumer Reports)

Speed Dating Experiment

For this dataset, data was gathered from participants in experimental speed dating events from 2002-2004. During the events, the attendees would have a four-minute "first date" with every other participant of the opposite sex. At the end of their four minutes, participants were asked if they would like to see their date again. They were also asked to rate their date on six attributes: Attractiveness, Sincerity, Intelligence, Fun, Ambition, and Shared Interests.

The dataset also includes questionnaire data gathered from participants at different points in the process. These fields include: demographics, dating habits, self-perception across key attributes, beliefs on what others find valuable in a mate, and lifestyle information.

This dataset was compiled by Columbia Business School professors Ray Fisman and Sheena Iyengar and uploaded to Kaggle by user Anna Montoya. For more information, see the Kaggle page at

This dataset is extremely complex, with over 200 columns. For your convenience in interpreting the columns, a key is provided along with the CSV file containing the data.

Titanic Passengers List

This CSV dataset consists of basic information for 887 passengers aboard the HMS Titanic when it sank in 1912, including name, age, gender, passenger class, fare amount, number of family members aboard, and whether they survived the disaster. 

There is a huge number of user-created datasets publicly available that utilize this information. To view any of these datasets and/or learn more about how Titanic data is being used for machine learning, visit and search for "titanic."

Columns in this dataset are as follows:




Shows using Boolean values whether or not this passenger survived (0 is no and 1 is yes)


The passenger class for this person, either 1, 2, or 3


The name of this passenger


The gender of this passenger


The age of this passenger

Siblings/Spouses Aboard

The number of siblings and/or spouses accompanying this passenger

Parents/Children Aboard

The number of parents and/or children accompanying this passenger


The fare paid by this passenger to board, in British pounds (£)