Best Practices for AutoML
Machine Learning process for success
What are the most common approaches in machine learning and how can I leverage them for business growth and success? The following are the most common principles used when applying machine learning to business problems.
- Data Collection
- The quantity & quality of your data dictates how accurate our model is.
- The outcome of this step is generally a representation of data that we will use for training.
- Using pre-collected data, by way of datasets from your business, or samples from Kaggle, UCI, etc.
- Data Preparation
- Curate data and prepare it for training.
- Clean that which may require it (remove duplicates, correct errors, deal with missing values, normalization, data type conversions, etc.)
- Randomize data, which erases the effects of the particular order in which we collected and/or otherwise prepared our data.
- Visualize data to help detect relevant relationships between variables or class imbalances (bias alert!), or perform other exploratory analysis.
- Split into training and evaluation sets.
- Choose a Model
- Different algorithms are for different tasks; choose the right one.
- Train the Model
- The goal of training is to answer a question or make a prediction correctly as often as possible.
- Linear regression example: algorithm would need to learn values for m (or W) and b (x is input, y is output.)
- Each iteration of the process is a training step.
- Evaluate the Model
- Uses some metric or combination of metrics to "measure" objective performance of the model.
- Test the model against previously unseen data.
- This unseen data is meant to be somewhat representative of model performance in the real world, but still helps tune the model (as opposed to test data, which does not.)
- Good train/eval split? 80/20, 70/30, or similar, depending on the domain, data availability, dataset particulars, etc.
- Parameter Tuning
- This step refers to hyperparameter tuning, which is an "art form" as opposed to a science.
- Tune model parameters for improved performance.
- Simple model hyperparameters may include: number of training steps, learning rate, initialization values, distribution, etc.
- Make Predictions
- Using further (test set) data which have, until this point, been withheld from the model (and for which class labels are known), are used to test the model; a better approximation of how the model will perform in the real world.
Best practices in Machine Learning for business
- A Machine Learning solution maps input→output.
- Always ask and explore how well your data explains (predicts) your output.
- If performance is better than the current process?
- Consider implementing.
- Perform cost-benefit analysis of model improvement vs cost of development to the business. (if low → keep going.)
- Not better than your current process?
- Improve your training data.
- Optimize your ML method.
- Learn more about how your data represents your domain.