{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial 2: Step-by-step scATAC-seq analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we will use scATAC-seq dataset `Forebrain' as an example to illustrate how scAGDE performs scATAC-seq analysis in an step-by-step style." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Read and preprocess data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We first read '.h5ad' data file using [Scanpy](https://github.com/scverse/scanpy) package" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import scanpy as sc\n", "adata = sc.read_h5ad(\"data/Forebrain.h5ad\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use Scanpy to further filter data. In our case, we pass this step because the loaded dataset has been preprocessed. Some codes for filtering are copied below for easy reference:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "sc.pp.filter_cells(adata, min_genes=100)\n", "min_cells = int(adata.shape[0] * 0.01)\n", "sc.pp.filter_genes(adata, min_cells=min_cells)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 2088 × 11285\n", " obs: 'celltype', 'n_genes'\n", " var: 'n_cells'\n", " uns: 'neighbors', 'pca', 'tsne', 'umap'\n", " obsm: 'X_pca', 'X_tsne', 'X_umap'\n", " varm: 'PCs'\n", " obsp: 'connectivities', 'distances'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Setup and train scAGDE model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can initialize the trainer with the AnnData object, which will ensure settings for model are in place for training. \n", "\n", "We can specify the `outdir` to the dir path where we want to save the output file (mainly the model weights file).\n", "\n", "`n_centroids` represents the cluster number of dataset. If this information is unknown, we can set `n_centroids=None` and in this case, scAGDE will apply the estimation strategy to estimate the optimal cluster number for the initialization of its cluster layer. Here, we set `n_centroids=None` to allow the estimation strategy.\n", " \n", "We can train scAGDE on specified device by setting `gpu`. For example, train scAGDE on CPUs by `gpu=None` and trian it on GPU #0 by `gpu=\"0\"`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "