Snippets of code for getting started with machine learning, using PyTorch, Pandas, Numpy, and Kaggle
Before and after filling with the modes, run this in a cell:
df.isna().sum()
Fill with the modes:
# Get the modes for the data frame
modes = df.mode().iloc[0]
# Fill NaN values
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html
df.fillna(modes, inplace=True)
For notebooks that might be run on Mac vs GPU vs CPU:
torch_device = "cuda" if torch.cuda.is_available() else "mps" if torch.has_mps else "cpu"
print(f"Using device: {torch_device}")
For notebooks on a Mac with Apple Silicon (see also “ML on a Mac”)
# https://pytorch.org/docs/stable/notes/mps.html
if not torch.backends.mps.is_available():
if not torch.backends.mps.is_built():
print(
"MPS not available because the current PyTorch install was not "
"built with MPS enabled."
)
else:
print(
"MPS not available because the current MacOS version is not 12.3+ "
"and/or you do not have an MPS-enabled device on this machine."
)
else:
print("MPS is available. Setting as default device.")
mps_device = torch.device("mps")
# Set fastai's `default_device()` to MPS
# https://github.com/fastai/fastai/blob/0d952d3c234629ec6d6a909186e79af3c5a9a1b8/fastai/torch_core.py#L271
try:
default_device(mps_device)
except:
print("default_device() is not defined. Did you import `fastai`?")
Use this snippet at the top of Kaggle notebooks and non-Kaggle hosted notebooks.
import os
from pathlib import Path
competition = "titanic" # Change this to any Kaggle competition name
iskaggle = os.environ.get("KAGGLE_KERNEL_RUN_TYPE", "")
if iskaggle:
path = Path(f"../input/{competition}")
else:
import kaggle
# Use .kaggle_data folders that will be gitignored
path = Path(".kaggle_data")
if not path.exists():
import zipfile
kaggle.api.competition_download_cli(competition=competition, path=str(path))
zipfile.ZipFile(f"{path}/{competition}.zip").extractall(path)
print(f"Ready for competition: {competition}")