Skip to main content
waffle.svg
Domo Knowledge Base

AutoML (Machine Learning)

Version 5

 

Intro

Domo leads the way in making machine learning accessible to everyone.

1.png

You depend on Domo to access your data from a wide variety of sources and make it usable in record time. Machine learning (ML) is the next step to provide insight to complex business problems, improve decision-making, and automate business processes. However, many organizations don’t have the data science expertise to create a machine learning model for their data. Additionally, organizations who have data science teams are often drowning in opportunities and spend months exploring different business problems without ever getting their machine learning solutions into production. AutoML provides a solution in these situations.

With Domo’s Data Science Suite, you can go from data to models to business outcomes—faster. Data science models can be prepared, tested, and fully deployed into production. Domo can do the heavy lifting with AutoML—by automatically generating and testing machine learning models until it finds the best fit for any business outcome.

Automated machine learning (AutoML) is the procedure of automating the process of applying machine learning to real-world business problems. AutoML covers the complete pipeline from the raw DataSet to the deployable machine learning model. AutoML was proposed as an artificial intelligence-based solution to the ever-growing challenge of applying machine learning. The high degree of automation in AutoML allows non-experts to make use of machine learning models and techniques without requiring becoming an expert in the field first.

Automating the process of applying machine learning end-to-end additionally offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform hand-designed models.

Domo and AWS: A partnership for success

2.png

We’re thrilled to announce that machine learning is now within everyone’s grasp with our new automated machine learning (AutoML) feature included in our data science suite. In partnership with Amazon SageMaker Autopilot, we’ve created AutoML capabilities that allow you to augment your analytics with machine learning, whether you’re a data scientist or a data science novice, the feature enables you to go from data to models to outcomes—faster, while creating astute data products for the enterprise. 

Amazon SageMaker Autopilot is an Amazon Web Services (AWS) solution that automatically trains and tunes ML models based on data provided by a customer. Companies can now use their data in Domo as input into Amazon SageMaker Autopilot, automatically create the highest performing model and deploy a prediction pipeline that adapts to new, incoming data. The combination of Domo and Amazon SageMaker Autopilot helps make ML accessible to more employees and propels ML-driven insights for business.

Getting this feature

If you are interested in joining the Beta for this feature, please contact your Customer Success Manager (CSM).

How can data science and machine learning (DSML) help your business or organization?

The benefits of machine learning include:

  1. The ability to optimize business outcomes.
  2. Understanding the deeper drivers of results through insights.
  3. Make more accurate predictions for a process or problem.
  4. Automate decision making to free human capital to focus on other priorities.
  5. Provide insights into potential causality that can be explored with further experimentation and analysis.

What does AutoML do you for you?

Domo AutoML automatically trains and tunes the best machine learning models for classification or regression, based on your data while allowing to maintain full control and visibility. 

Building machine learning models has traditionally required a binary choice. On one hand, you could manually prepare the features, select the algorithm, and optimize the model parameters in order to have full control over the model design and understand all the thought that went into creating it. However, this approach requires deep ML expertise. On the other hand, if you don’t have that expertise, you could use an automated approach (AutoML) to model generation that takes care of all of the heavy lifting, but provides very little visibility into how the model was created. While a model created with AutoML can work well, you may have less trust in it because you can’t understand what went into it, you can’t recreate it, and you can’t learn best practices which may help you in the future.

Domo AutoML eliminates this choice, allowing you to automatically build machine learning models without compromises. With Domo AutoML, you provide a tabular DataSet and select the target column to predict, which can be a number (such as a house price, called regression), or a category (such as spam/not spam, called classification). Domo AutoML will automatically explore different solutions to find the best model. Additionally, you can then directly deploy the model to production with just one click. https://aws.amazon.com/sagemaker/autopilot/

How AutoML works

3.png

Domo leverages Amazon SageMaker Autopilot to make it easy to automatically train and tune ML models needed to predict outcomes. With just a few clicks, AutoML will transform your data to be ready for machine learning and launch hundreds of training jobs on any DataSet in Domo to find the model that achieves the best performance for your task. You can then easily deploy the model on your Domo DataSets with the new AutoML Interface tile in Magic ETL. All that is required to get success in machine learning with AutoML is a well-defined problem and a clean DataSet—Domo will take care of the rest.

Check out this video to see it in action.

Steps to use AutoML

  1. Log into your Domo instance.
  2. Align your data toward your business objective.
  3. Prepare your DataSet (details on data prep can be found here.)
  4. Create a target column for training.
  5. Select Machine Learning task.
  6. Configure candidate run and target objective.
  7. Launch an AutoML training job.
  8. Evaluate model performance on the Model Leaderboard Page.
  9. Interpret results of model overview.
  10. Launch a deploy configuration.
  11. Configure tile run predictions.
  12. If the model performs adequately for your business objective, deploy the AutoML Inference tile in Magic ETL.
  13. Schedule the AutoML tiles.
  14. Set up model monitoring Dashboards.
  15. Create an app or Story on predictions for stakeholders.

Prepare data for AutoML

Take some time to align with your business stakeholders so you have a thorough understanding of your business problems, which tasks would be worthwhile to automate, what model performance would be acceptable for your business problem, and what data is available. Spend some time gathering your data together into one DataSet that has one column with the output variable (the variable you want to predict that AutoML will train on) and many columns that enumerate the inputs that you expect to have an effect/influence on your output column. Be sure to only include one output column for your DataSet. Additionally, each row should encapsulate one record in your business problem. For example, if you would like to predict which sales opportunity will close, each row should enumerate one distinct sales opportunity, from start to finish, with the output column listing the result of the sales opportunity (ie won or lost).

There are many wonderful blog posts, courses, books and videos on data cleaning for machine learning that you can find online or Domo’s professional services team are always a resource at your disposal to enhance your data cleaning skillset.

How to Launch an AutoML Training Job

Below are instructions on how you can use AutoML along with a sample DataSet for you to try. This is a DataSet outlining customer churn at a phone company, originally obtained from the UCI machine learning repository.

Click on this link to download the test DataSet and then upload it to your instance as a new DataSet.

  1. Once your DataSet is uploaded to Domo, you should be automatically directed to the DataSet Details page for your new Churn DataSet. Click on the AutoML tab or select the Train New Algorithm option and then click Get started.

    1.png

    2.png
     
  2. AutoML will ask you which column you want to automate. You want to predict customer churn, which is based on the “class” column in the DataSet. Choose "class" in the column drop down.

    3.png
     
  3. Once your Target Column (“class”) has been selected, determine what training task you would like AutoML to run. For our scenario, you can choose Binary Classification, but you are ever unsure, AutoML will automatically detect the task type for you. In the advanced menu, you can specify the number of candidates you would like to run. This allows you to have direct control of the tradeoff of how long your job will run to the maximum model performance you can obtain. Additionally, when you select the task type you also have the option in the advanced menu to select different objective metrics. For more information on objective metrics see https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AutoMLJobObjective.html.

    4.png

    5.png

    6.png

    7.png
     
  4. Click Start Training and then sit back and watch as Mr Roboto does all the work for you!

     

How to evaluate model performance

On the Model Overview Page, the top performing model will automatically be highlighted for you. On this Page you can compare the many different trials that AutoML ran on your behalf. You can view model performance against the training and validation sets and look at what hyperparameters produced the best performing model. If performance is acceptable to your business problem you can deploy the model in production via a Magic ETL DataFlow. This will allow you to make predictions against your data every time the data updates.

Using the Model Overview page

The Model Overview page should be used to determine whether a trained model has reached a satisfactory level of quality to be used for automated or human decision making.

Model Diagnostics

The results from the hyperparameter search are displayed in the “Model Test Results” pane. In this view, you can see all of the experiments AutoML explores. The model results are sorted by the evaluation metric (e.g., AUC for classification and MSE for regression) computed against the validation DataSet. Scrolling right reveals the entire set of parameter values for a given run. Scrolling down reveals all of the candidate models from the AutoML training run.

11.png

In the figure above, candidate model 21 yielded the lowest validation MSE and is the best performing model on unseen data. Additionally, model 21 is highlighted in the “Best Candidate” pane at the top of the screen. Note that in all cases in the figure above, the validation MSE is higher than the training MSE. This is due to the fact that the model has never “seen” this dataset before (see Train/Validation Split). In the case of Model 1, this difference is quite large, indicating overfitting issues. When comparing two models, we should always compare them on a relative basis using the validation error as a criterion.

One key question should be, how useful is my model in an absolute sense? We have previously discussed relative comparisons between models, but have yet to address when a model is practically “good enough.” We can generally try two strategies to answer this question.

First, we can compare our model to random guessing. For example, suppose are building a model to predict the result of flipping a coin. Guessing totally at random would yield 50% accuracy, on average. In order for a model to be of use to us, it should predict with greater than 50% accuracy. For classification problems, we can look at the AUC. AUC of 0.5 equals to random guessing. Any model we build should outperform this number.

Second, we might ask. “how well is model performing compared to my current processes?” For one, we should determine if we are measuring the performance of our current process. If the answer is no, we should start. Once have an idea of the current performance baseline, we can experiment with more advanced model. If our model does not outperform the status quo, it is of little use to us. This can be somewhat more challenging because we will need to compare our current process to the model-based one. Suppose we are again predicting the outcome of a coin flip. Suppose further that current process is to ask our mathematician friend what the outcome will be, which on average yields 55% accuracy. To be of any use, our model should outperform this process. This challenge here is we need to collect data (predicted values) and manually compare them to the ground truth by computing the AUC or MSE. This comparison should be done wherever possible. 

In summary, we should use the Model Overview and perform the following diagnostics:

  1. Look at the difference between validation and training error to evaluate overfitting.
  2. Compare to random guessing.
  3. Compare the model error rate to our current business process. 

How to interpret the Model Overview (leaderboard) page?

The Model Overview page summarizes output from Domo’s AutoML platform. The key idea behind Domo’s AutoML feature is we can automatically train hundreds of models, each with a different setting. The Model overview page allows us to view key statistics related to different training runs, and determine if we are ready to deploy the highest performing model.

Deploy any candidate from the Model Overview page

AutoML will automatically select the model with the best validation performance as the best model to deploy in ETL. However, you can also deploy any other model in the leaderboard grid by simply clicking on it and hitting the deploy button. I this way you can deploy dataset were the validation and training datasets are more closely aligned which could lead to deploying a model that is less overfit.

8.png

How to make predictions on new data via the AutoML Interface tile in MagicETL

Connect a DataSet with a matching schema to your training DataSet and the AutoML Inference tile will use that machine learning model trained by AutoML to make predictions on all the rows of your DataSet.

9.png

 

If you attempt to connect a DataSet that does not match the schema of the training DataSet at the time you executed your training job an error message will be displayed with the mismatch errors.

10.png

If you deploy a model from a blank MagicETL canvas and drag an unconfigured AutoML Inference tile from the Data Science section of the MagicETL left rail, you will be able to select from the DataSets with a successful AutoML job and the individual job run. However, from this option you will only be able to deploy the model that is deemed best by AutoML (the model with the best performing objective metric on the AutoML validation data.) In order to deploy any other candidates, go back to the model overview page, click on your model in the leaderboard grid and hit the Deploy to ETL button.