ML snippets

Snippets of code for getting started with machine learning, using PyTorch, Pandas, Numpy, and Kaggle

Dec 29, 2022 2 min read

Tips and approaches

Fill NaN with modes using pandas

Before and after filling with the modes, run this in a cell:

df.isna().sum()

Fill with the modes:

# Get the modes for the data frame
modes = df.mode().iloc[0]

# Fill NaN values
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html
df.fillna(modes, inplace=True)

Use Apple’s Mac M1/M2 GPU’s aka Apple Silicon with Core ML

For notebooks that might be run on Mac vs GPU vs CPU:

torch_device = "cuda" if torch.cuda.is_available() else "mps" if torch.has_mps else "cpu"
print(f"Using device: {torch_device}")

For notebooks on a Mac with Apple Silicon (see also “ML on a Mac”)

# https://pytorch.org/docs/stable/notes/mps.html
if not torch.backends.mps.is_available():
    if not torch.backends.mps.is_built():
        print(
            "MPS not available because the current PyTorch install was not "
            "built with MPS enabled."
        )
    else:
        print(
            "MPS not available because the current MacOS version is not 12.3+ "
            "and/or you do not have an MPS-enabled device on this machine."
        )

else:
    print("MPS is available. Setting as default device.")
    mps_device = torch.device("mps")

    # Set fastai's `default_device()` to MPS
    # https://github.com/fastai/fastai/blob/0d952d3c234629ec6d6a909186e79af3c5a9a1b8/fastai/torch_core.py#L271
    try:
        default_device(mps_device)
    except:
        print("default_device() is not defined. Did you import `fastai`?")

Kaggle competition snippet

Use this snippet at the top of Kaggle notebooks and non-Kaggle hosted notebooks.

import os
from pathlib import Path

competition = "titanic"  # Change this to any Kaggle competition name
iskaggle = os.environ.get("KAGGLE_KERNEL_RUN_TYPE", "")

if iskaggle:
    path = Path(f"../input/{competition}")
else:
    import kaggle

    # Use .kaggle_data folders that will be gitignored
    path = Path(".kaggle_data")

    if not path.exists():
        import zipfile

        kaggle.api.competition_download_cli(competition=competition, path=str(path))
        zipfile.ZipFile(f"{path}/{competition}.zip").extractall(path)

print(f"Ready for competition: {competition}")

Resources

Read more posts like this in the Software Engineering Toolbox collection.
Visit homepage
comments powered by Disqus