{ "cells": [ { "cell_type": "markdown", "id": "b0000bed-ebd4-4f95-8d63-c87cba8565ac", "metadata": {}, "source": [ "# Pinder index" ] }, { "cell_type": "markdown", "id": "2c23b631-1861-426b-9510-2888cf3aee5b", "metadata": {}, "source": [ "## Download the dataset\n", "\n", "NOTE: the default location for the dataset is `~/.local/share/pinder/`\n", "\n", "If you want to use a different location, you can do so by setting the `PINDER_BASE_DIR` environment variable.\n", "\n", "The base dir refers to a fully qualified path name up until the `` (not inclusive). \n", "\n", "For instance, you could:\n", "```\n", "export PINDER_BASE_DIR=~/my-custom-location-for-pinder/pinder\n", "```\n", "\n", "You can always check the current location of the dataset like so:\n", "```python\n", "from pinder.core import get_pinder_location\n", "get_pinder_location()\n", "```" ] }, { "cell_type": "code", "execution_count": 1, "id": "ecb996d2-2676-4eee-8636-01e26227bdeb", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02')" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pinder.core import get_pinder_location\n", "get_pinder_location()\n" ] }, { "cell_type": "markdown", "id": "1da603c9-4896-4f4c-be22-b1d79af23808", "metadata": {}, "source": [ "### To download the complete dataset run the following" ] }, { "cell_type": "code", "execution_count": 2, "id": "301ca179-0394-4106-be38-7ddab0caa16d", "metadata": {}, "outputs": [], "source": [ "from pinder.core import download_dataset\n", "# download_dataset()\n" ] }, { "cell_type": "markdown", "id": "ada695c7-923c-4f58-9e8f-2edd715a31e6", "metadata": {}, "source": [ "### Alternatively, use the CLI script `pinder_download`\n", "\n", "```bash\n", "pinder_download --help\n", "\n", "usage: Download latest pinder dataset to disk [-h] [--pinder_base_dir PINDER_BASE_DIR] [--pinder_release PINDER_RELEASE] [--skip_inflation]\n", "\n", "optional arguments:\n", " -h, --help show this help message and exit\n", " --pinder_base_dir PINDER_BASE_DIR\n", " specify a non-default pinder base directory\n", " --pinder_release PINDER_RELEASE\n", " specify a pinder dataset version\n", " --skip_inflation if passed, will only download the compressed archives without unpacking\n", "```" ] }, { "cell_type": "markdown", "id": "31599fbd-5d69-4710-bf42-960ecf2fdc5d", "metadata": {}, "source": [ "### The full dataset should look like this\n", "\n", "```bash\n", "~/.local/share/pinder//\n", " pdbs/\n", " csvs/\n", " index.csv.gz\n", "```" ] }, { "cell_type": "markdown", "id": "b629c20c-5a9f-427c-81ae-e7444b873372", "metadata": {}, "source": [ "## Pinder metadata API" ] }, { "cell_type": "code", "execution_count": 3, "id": "7c5ad9f9-c15b-40c7-bc1b-7a4d24bd9d4e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
identry_idmethoddaterelease_dateresolutionlabelprobabilitychain1_idchain2_id...interface_atom_gaps_4Amissing_interface_residues_4Ainterface_atom_gaps_8Amissing_interface_residues_8Aentity_id_Rentity_id_Lpdb_strand_id_Rpdb_strand_id_LECOD_names_RECOD_names_L
07rzb__A1_A0A229LVN5--7rzb__A2_A0A229LVN57rzbX-RAY DIFFRACTION2021-08-272022-04-131.599609BIO0.576172RL...000011AAPF06491PF06491
13t2l__A1_Q5LE95--3t2l__A2_Q5LE953t2lX-RAY DIFFRACTION2011-07-222011-08-102.330078BIO0.983887RL...000011AAF_UNCLASSIFIED,PF13149F_UNCLASSIFIED,PF13149
26ikj__A1_Q9I4L6--6ikj__B1_Q9I4L66ikjX-RAY DIFFRACTION2018-10-162019-03-131.759766BIO0.543945RL...000011ABPF00691PF00691
38iyi__A1_Q6CVU4--8iyi__B1_Q6CVU48iyiX-RAY DIFFRACTION2023-04-052023-06-281.900391BIO0.992188RL...000011ABPF17284,PF01564PF17284,PF01564
43uws__B1_A7A9N3--3uws__A1_A7A9N33uwsX-RAY DIFFRACTION2011-12-022012-06-131.700195BIO0.996094RL...000021BA
..................................................................
23195596hbg__C28_Q8V635--6hbg__C35_Q8V6356hbgELECTRON MICROSCOPY2018-08-102019-03-203.160156BIO0.512207RL...000033CCPF00073PF00073
23195606hbg__A11_Q8V635--6hbg__A54_Q8V6356hbgELECTRON MICROSCOPY2018-08-102019-03-203.1601560.000000RL...000011AAPF00073PF00073
23195616hbg__C15_Q8V635--6hbg__D4_Q8V6356hbgELECTRON MICROSCOPY2018-08-102019-03-203.160156XTAL0.491943RL...000034CDPF00073PF02226
23195626hbg__C33_Q8V635--6hbg__D52_Q8V6356hbgELECTRON MICROSCOPY2018-08-102019-03-203.160156XTAL0.491943RL...000034CDPF00073PF02226
23195636rwh__A1_P31947--6rwh__A2_P319476rwhX-RAY DIFFRACTION2019-06-052020-06-171.679688BIO0.835938RL...4064011AAPF00244PF00244
\n", "

2319564 rows × 51 columns

\n", "
" ], "text/plain": [ " id entry_id \\\n", "0 7rzb__A1_A0A229LVN5--7rzb__A2_A0A229LVN5 7rzb \n", "1 3t2l__A1_Q5LE95--3t2l__A2_Q5LE95 3t2l \n", "2 6ikj__A1_Q9I4L6--6ikj__B1_Q9I4L6 6ikj \n", "3 8iyi__A1_Q6CVU4--8iyi__B1_Q6CVU4 8iyi \n", "4 3uws__B1_A7A9N3--3uws__A1_A7A9N3 3uws \n", "... ... ... \n", "2319559 6hbg__C28_Q8V635--6hbg__C35_Q8V635 6hbg \n", "2319560 6hbg__A11_Q8V635--6hbg__A54_Q8V635 6hbg \n", "2319561 6hbg__C15_Q8V635--6hbg__D4_Q8V635 6hbg \n", "2319562 6hbg__C33_Q8V635--6hbg__D52_Q8V635 6hbg \n", "2319563 6rwh__A1_P31947--6rwh__A2_P31947 6rwh \n", "\n", " method date release_date resolution label \\\n", "0 X-RAY DIFFRACTION 2021-08-27 2022-04-13 1.599609 BIO \n", "1 X-RAY DIFFRACTION 2011-07-22 2011-08-10 2.330078 BIO \n", "2 X-RAY DIFFRACTION 2018-10-16 2019-03-13 1.759766 BIO \n", "3 X-RAY DIFFRACTION 2023-04-05 2023-06-28 1.900391 BIO \n", "4 X-RAY DIFFRACTION 2011-12-02 2012-06-13 1.700195 BIO \n", "... ... ... ... ... ... \n", "2319559 ELECTRON MICROSCOPY 2018-08-10 2019-03-20 3.160156 BIO \n", "2319560 ELECTRON MICROSCOPY 2018-08-10 2019-03-20 3.160156 \n", "2319561 ELECTRON MICROSCOPY 2018-08-10 2019-03-20 3.160156 XTAL \n", "2319562 ELECTRON MICROSCOPY 2018-08-10 2019-03-20 3.160156 XTAL \n", "2319563 X-RAY DIFFRACTION 2019-06-05 2020-06-17 1.679688 BIO \n", "\n", " probability chain1_id chain2_id ... interface_atom_gaps_4A \\\n", "0 0.576172 R L ... 0 \n", "1 0.983887 R L ... 0 \n", "2 0.543945 R L ... 0 \n", "3 0.992188 R L ... 0 \n", "4 0.996094 R L ... 0 \n", "... ... ... ... ... ... \n", "2319559 0.512207 R L ... 0 \n", "2319560 0.000000 R L ... 0 \n", "2319561 0.491943 R L ... 0 \n", "2319562 0.491943 R L ... 0 \n", "2319563 0.835938 R L ... 4 \n", "\n", " missing_interface_residues_4A interface_atom_gaps_8A \\\n", "0 0 0 \n", "1 0 0 \n", "2 0 0 \n", "3 0 0 \n", "4 0 0 \n", "... ... ... \n", "2319559 0 0 \n", "2319560 0 0 \n", "2319561 0 0 \n", "2319562 0 0 \n", "2319563 0 64 \n", "\n", " missing_interface_residues_8A entity_id_R entity_id_L \\\n", "0 0 1 1 \n", "1 0 1 1 \n", "2 0 1 1 \n", "3 0 1 1 \n", "4 0 2 1 \n", "... ... ... ... \n", "2319559 0 3 3 \n", "2319560 0 1 1 \n", "2319561 0 3 4 \n", "2319562 0 3 4 \n", "2319563 0 1 1 \n", "\n", " pdb_strand_id_R pdb_strand_id_L ECOD_names_R \\\n", "0 A A PF06491 \n", "1 A A F_UNCLASSIFIED,PF13149 \n", "2 A B PF00691 \n", "3 A B PF17284,PF01564 \n", "4 B A \n", "... ... ... ... \n", "2319559 C C PF00073 \n", "2319560 A A PF00073 \n", "2319561 C D PF00073 \n", "2319562 C D PF00073 \n", "2319563 A A PF00244 \n", "\n", " ECOD_names_L \n", "0 PF06491 \n", "1 F_UNCLASSIFIED,PF13149 \n", "2 PF00691 \n", "3 PF17284,PF01564 \n", "4 \n", "... ... \n", "2319559 PF00073 \n", "2319560 PF00073 \n", "2319561 PF02226 \n", "2319562 PF02226 \n", "2319563 PF00244 \n", "\n", "[2319564 rows x 51 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pinder.core import get_metadata\n", "\n", "metadata = get_metadata()\n", "metadata\n" ] }, { "cell_type": "markdown", "id": "00427ee6-105a-4901-b647-52d667cf9eac", "metadata": {}, "source": [ "## Pinder index API" ] }, { "cell_type": "code", "execution_count": 4, "id": "71223020-07c6-4a55-ba37-2343e7b4c400", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
splitidpdb_idcluster_idcluster_id_Rcluster_id_Lpinder_spinder_xlpinder_af2uniprot_R...apo_Lapo_R_qualityapo_L_qualitychain1_neffchain2_neffchain_Rchain_Lcontains_antibodycontains_antigencontains_enzyme
0test7rzb__A1_A0A229LVN5--7rzb__A2_A0A229LVN57rzbcluster_16129_16129cluster_16129cluster_16129FalseTrueTrueA0A229LVN5...False287.000000287.000000A1A2FalseFalseFalse
1test3t2l__A1_Q5LE95--3t2l__A2_Q5LE953t2lcluster_30933_30933cluster_30933cluster_30933FalseTrueFalseQ5LE95...False7.1757817.175781A1A2FalseFalseFalse
2test6ikj__A1_Q9I4L6--6ikj__B1_Q9I4L66ikjcluster_1924_1924cluster_1924cluster_1924FalseTrueFalseQ9I4L6...Truehighhigh845.000000845.000000A1B1FalseFalseFalse
3test8iyi__A1_Q6CVU4--8iyi__B1_Q6CVU48iyicluster_142_142cluster_142cluster_142FalseTrueFalseQ6CVU4...False525.000000525.000000A1B1FalseFalseFalse
4test3uws__B1_A7A9N3--3uws__A1_A7A9N33uwscluster_21030_21031cluster_21030cluster_21031FalseTrueFalseA7A9N3...False147.375000147.375000B1A1FalseFalseFalse
..................................................................
2319559invalid6hbg__C28_Q8V635--6hbg__C35_Q8V6356hbgcluster_-1_-1cluster_-1cluster_-1FalseFalseFalseQ8V635...False37.65625037.656250C28C35FalseFalseTrue
2319560invalid6hbg__A11_Q8V635--6hbg__A54_Q8V6356hbgcluster_-1_-1cluster_-1cluster_-1FalseFalseFalseQ8V635...False37.65625037.656250A11A54FalseFalseTrue
2319561invalid6hbg__C15_Q8V635--6hbg__D4_Q8V6356hbgcluster_-1_pcluster_pcluster_-1FalseFalseFalseQ8V635...False37.65625037.656250C15D4FalseFalseTrue
2319562invalid6hbg__C33_Q8V635--6hbg__D52_Q8V6356hbgcluster_-1_pcluster_pcluster_-1FalseFalseFalseQ8V635...False37.65625037.656250C33D52FalseFalseTrue
2319563invalid6rwh__A1_P31947--6rwh__A2_P319476rwhcluster_2_2cluster_2cluster_2FalseFalseFalseP31947...False457.750000457.750000A1A2FalseFalseFalse
\n", "

2319564 rows × 34 columns

\n", "
" ], "text/plain": [ " split id pdb_id \\\n", "0 test 7rzb__A1_A0A229LVN5--7rzb__A2_A0A229LVN5 7rzb \n", "1 test 3t2l__A1_Q5LE95--3t2l__A2_Q5LE95 3t2l \n", "2 test 6ikj__A1_Q9I4L6--6ikj__B1_Q9I4L6 6ikj \n", "3 test 8iyi__A1_Q6CVU4--8iyi__B1_Q6CVU4 8iyi \n", "4 test 3uws__B1_A7A9N3--3uws__A1_A7A9N3 3uws \n", "... ... ... ... \n", "2319559 invalid 6hbg__C28_Q8V635--6hbg__C35_Q8V635 6hbg \n", "2319560 invalid 6hbg__A11_Q8V635--6hbg__A54_Q8V635 6hbg \n", "2319561 invalid 6hbg__C15_Q8V635--6hbg__D4_Q8V635 6hbg \n", "2319562 invalid 6hbg__C33_Q8V635--6hbg__D52_Q8V635 6hbg \n", "2319563 invalid 6rwh__A1_P31947--6rwh__A2_P31947 6rwh \n", "\n", " cluster_id cluster_id_R cluster_id_L pinder_s \\\n", "0 cluster_16129_16129 cluster_16129 cluster_16129 False \n", "1 cluster_30933_30933 cluster_30933 cluster_30933 False \n", "2 cluster_1924_1924 cluster_1924 cluster_1924 False \n", "3 cluster_142_142 cluster_142 cluster_142 False \n", "4 cluster_21030_21031 cluster_21030 cluster_21031 False \n", "... ... ... ... ... \n", "2319559 cluster_-1_-1 cluster_-1 cluster_-1 False \n", "2319560 cluster_-1_-1 cluster_-1 cluster_-1 False \n", "2319561 cluster_-1_p cluster_p cluster_-1 False \n", "2319562 cluster_-1_p cluster_p cluster_-1 False \n", "2319563 cluster_2_2 cluster_2 cluster_2 False \n", "\n", " pinder_xl pinder_af2 uniprot_R ... apo_L apo_R_quality \\\n", "0 True True A0A229LVN5 ... False \n", "1 True False Q5LE95 ... False \n", "2 True False Q9I4L6 ... True high \n", "3 True False Q6CVU4 ... False \n", "4 True False A7A9N3 ... False \n", "... ... ... ... ... ... ... \n", "2319559 False False Q8V635 ... False \n", "2319560 False False Q8V635 ... False \n", "2319561 False False Q8V635 ... False \n", "2319562 False False Q8V635 ... False \n", "2319563 False False P31947 ... False \n", "\n", " apo_L_quality chain1_neff chain2_neff chain_R chain_L \\\n", "0 287.000000 287.000000 A1 A2 \n", "1 7.175781 7.175781 A1 A2 \n", "2 high 845.000000 845.000000 A1 B1 \n", "3 525.000000 525.000000 A1 B1 \n", "4 147.375000 147.375000 B1 A1 \n", "... ... ... ... ... ... \n", "2319559 37.656250 37.656250 C28 C35 \n", "2319560 37.656250 37.656250 A11 A54 \n", "2319561 37.656250 37.656250 C15 D4 \n", "2319562 37.656250 37.656250 C33 D52 \n", "2319563 457.750000 457.750000 A1 A2 \n", "\n", " contains_antibody contains_antigen contains_enzyme \n", "0 False False False \n", "1 False False False \n", "2 False False False \n", "3 False False False \n", "4 False False False \n", "... ... ... ... \n", "2319559 False False True \n", "2319560 False False True \n", "2319561 False False True \n", "2319562 False False True \n", "2319563 False False False \n", "\n", "[2319564 rows x 34 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pinder.core import PinderSystem, get_index\n", "\n", "index = get_index()\n", "index" ] }, { "cell_type": "markdown", "id": "58385b67-74dd-427b-9ff5-8d7e09ae0bc9", "metadata": {}, "source": [ "### How to get subsets of data from the index " ] }, { "cell_type": "code", "execution_count": 5, "id": "67fb0b80-6e3e-4b33-82c6-ff4364b45cde", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
splitidpdb_idcluster_idcluster_id_Rcluster_id_Lpinder_spinder_xlpinder_af2uniprot_R...apo_Lapo_R_qualityapo_L_qualitychain1_neffchain2_neffchain_Rchain_Lcontains_antibodycontains_antigencontains_enzyme
0test7ubk__B1_P03047--7ubk__A1_P030477ubkcluster_31523_31523cluster_31523cluster_31523FalseTrueTrueP03047...Truehighhigh81.75000081.750000B1A1FalseFalseFalse
1test7zj1__B1_P55265--7zj1__A1_P552657zj1cluster_1209_1209cluster_1209cluster_1209FalseTrueTrueP55265...Truehighhigh174.250000174.250000B1A1FalseFalseTrue
2test7zjc__A1_P0DV83--7zjc__A2_P0DV837zjccluster_26441_26441cluster_26441cluster_26441FalseTrueTrueP0DV83...Truehighhigh6.9257816.925781A1A2FalseFalseFalse
3test7ztw__A1_Q66578--7ztw__B1_Q665787ztwcluster_22347_22347cluster_22347cluster_22347FalseTrueTrueQ66578...Truehighhigh13.38281213.382812A1B1FalseFalseFalse
4test8ard__A1_Q64331--8ard__A2_Q643318ardcluster_34280_34280cluster_34280cluster_34280FalseTrueTrueQ64331...Truehighhigh444.750000444.750000A1A2FalseFalseFalse
5test7wbt__A1_Q288C4--7wbt__B1_Q288C47wbtcluster_7371_7371cluster_7371cluster_7371TrueTrueTrueQ288C4...Truehighhigh169.500000169.500000A1B1FalseFalseFalse
6test7z7o__A1_A0A1S4NYF2--7z7o__C1_A0A1S4NYF27z7ocluster_91_91cluster_91cluster_91FalseTrueTrueA0A1S4NYF2...Truehighhigh10.63281210.632812A1C1FalseFalseFalse
7test8pte__A1_P00698--8pte__A2_P006988ptecluster_274_274cluster_274cluster_274FalseTrueTrueP00698...Truehighhigh254.375000254.375000A1A2FalseFalseTrue
8test7yka__B1_Q9Y3D6--7yka__A1_Q9Y3D67ykacluster_13298_13298cluster_13298cluster_13298FalseTrueTrueQ9Y3D6...Truehighhigh132.875000132.875000B1A1FalseFalseFalse
9test7ykv__B1_Q58241--7ykv__A1_Q582417ykvcluster_5358_5358cluster_5358cluster_5358FalseTrueTrueQ58241...Truehighhigh171.625000171.625000B1A1FalseFalseFalse
10test8a60__A1_P06971--8a60__B1_Q381628a60cluster_12107_26846cluster_12107cluster_26846FalseTrueTrueP06971...Truehighhigh288.0000002.734375A1B1FalseFalseFalse
11test7t5y__A1_P0A7E1--7t5y__B1_P0A7E17t5ycluster_7939_7939cluster_7939cluster_7939FalseTrueTrueP0A7E1...Truehighhigh735.500000735.500000A1B1FalseFalseTrue
12test8oru__A1_O93732--8oru__B1_O937328orucluster_14951_14951cluster_14951cluster_14951FalseTrueTrueO93732...Truehighhigh103.187500103.187500A1B1FalseFalseTrue
13test7z6m__A1_A0A0H3LM39--7z6m__A2_A0A0H3LM397z6mcluster_5362_5362cluster_5362cluster_5362FalseTrueTrueA0A0H3LM39...Truehighhigh533.000000533.000000A1A2FalseFalseFalse
14test8cnx__A1_Q68T42--8cnx__B1_Q68T428cnxcluster_609_609cluster_609cluster_609FalseTrueTrueQ68T42...Truehighhigh45.00000045.000000A1B1FalseFalseTrue
15test7wwo__B1_Q5SH57--7wwo__A1_Q5SH577wwocluster_5115_5115cluster_5115cluster_5115FalseTrueTrueQ5SH57...Truehighhigh2.9394532.939453B1A1FalseFalseFalse
16test8d0m__A1_P28907--8d0m__A2_P289078d0mcluster_2975_2975cluster_2975cluster_2975TrueTrueTrueP28907...Truehighhigh48.31250048.312500A1A2FalseFalseTrue
17test8avu__A1_Q8GPI4--8avu__A2_Q8GPI48avucluster_29712_29712cluster_29712cluster_29712FalseTrueTrueQ8GPI4...Truehighhigh11.88281211.882812A1A2FalseFalseFalse
18test8i2e__A1_O34841--8i2e__B1_P544218i2ecluster_11087_12465cluster_12465cluster_11087TrueTrueTrueO34841...Truehighhigh9.031250865.000000A1B1FalseFalseTrue
19test8aeu__A1_Q00987--8aeu__A2_Q009878aeucluster_1537_1537cluster_1537cluster_1537FalseTrueTrueQ00987...Truehighhigh350.750000350.750000A1A2FalseFalseTrue
20test7vso__A1_P02945--7vso__A2_P029457vsocluster_1035_1035cluster_1035cluster_1035FalseTrueTrueP02945...Truehighhigh267.250000267.250000A1A2FalseFalseFalse
21test7yuj__B1_Q9BYM8--7yuj__A1_Q9BYM87yujcluster_19439_19439cluster_19439cluster_19439TrueTrueTrueQ9BYM8...Truehighhigh266.750000266.750000B1A1FalseFalseTrue
22test8pvm__A1_P29166--8pvm__B1_P291668pvmcluster_6440_6440cluster_6440cluster_6440FalseTrueTrueP29166...Truehighhigh392.500000392.500000A1B1FalseFalseTrue
23test7yo8__A1_P60520--7yo8__A2_P605207yo8cluster_1022_1022cluster_1022cluster_1022FalseTrueTrueP60520...Truehighhigh385.500000385.500000A1A2FalseFalseFalse
24test7y51__A1_Q8RBF4--7y51__A2_Q8RBF47y51cluster_711_711cluster_711cluster_711FalseTrueTrueQ8RBF4...Truehighhigh954.500000954.500000A1A2FalseFalseFalse
25test7tvh__B1_Q9I2Q1--7tvh__A1_Q9I2Q17tvhcluster_8106_8106cluster_8106cluster_8106FalseTrueTrueQ9I2Q1...Truehighhigh209.125000209.125000B1A1FalseFalseTrue
26test8bwv__D2_A0A482M8M0--8bwv__A1_A0A482M8M08bwvcluster_27613_27613cluster_27613cluster_27613FalseTrueTrueA0A482M8M0...Truehighhigh395.000000395.000000D2A1FalseFalseFalse
27test7t91__A1_P08151--7t91__B1_P081517t91cluster_1801_1801cluster_1801cluster_1801FalseTrueTrueP08151...Truehighhigh321.250000321.250000A1B1FalseFalseFalse
28test7zoo__B1_A0A979GQH9--7zoo__A1_A0A979GQH97zoocluster_32633_32633cluster_32633cluster_32633FalseTrueTrueA0A979GQH9...Truehighhigh419.250000419.250000B1A1FalseFalseFalse
29test7x4b__A1_A0A2D0TCG3--7x4b__B1_A0A2D0TCG37x4bcluster_7334_7334cluster_7334cluster_7334FalseTrueTrueA0A2D0TCG3...Truehighhigh9.4531259.453125A1B1FalseFalseFalse
\n", "

30 rows × 34 columns

\n", "
" ], "text/plain": [ " split id pdb_id \\\n", "0 test 7ubk__B1_P03047--7ubk__A1_P03047 7ubk \n", "1 test 7zj1__B1_P55265--7zj1__A1_P55265 7zj1 \n", "2 test 7zjc__A1_P0DV83--7zjc__A2_P0DV83 7zjc \n", "3 test 7ztw__A1_Q66578--7ztw__B1_Q66578 7ztw \n", "4 test 8ard__A1_Q64331--8ard__A2_Q64331 8ard \n", "5 test 7wbt__A1_Q288C4--7wbt__B1_Q288C4 7wbt \n", "6 test 7z7o__A1_A0A1S4NYF2--7z7o__C1_A0A1S4NYF2 7z7o \n", "7 test 8pte__A1_P00698--8pte__A2_P00698 8pte \n", "8 test 7yka__B1_Q9Y3D6--7yka__A1_Q9Y3D6 7yka \n", "9 test 7ykv__B1_Q58241--7ykv__A1_Q58241 7ykv \n", "10 test 8a60__A1_P06971--8a60__B1_Q38162 8a60 \n", "11 test 7t5y__A1_P0A7E1--7t5y__B1_P0A7E1 7t5y \n", "12 test 8oru__A1_O93732--8oru__B1_O93732 8oru \n", "13 test 7z6m__A1_A0A0H3LM39--7z6m__A2_A0A0H3LM39 7z6m \n", "14 test 8cnx__A1_Q68T42--8cnx__B1_Q68T42 8cnx \n", "15 test 7wwo__B1_Q5SH57--7wwo__A1_Q5SH57 7wwo \n", "16 test 8d0m__A1_P28907--8d0m__A2_P28907 8d0m \n", "17 test 8avu__A1_Q8GPI4--8avu__A2_Q8GPI4 8avu \n", "18 test 8i2e__A1_O34841--8i2e__B1_P54421 8i2e \n", "19 test 8aeu__A1_Q00987--8aeu__A2_Q00987 8aeu \n", "20 test 7vso__A1_P02945--7vso__A2_P02945 7vso \n", "21 test 7yuj__B1_Q9BYM8--7yuj__A1_Q9BYM8 7yuj \n", "22 test 8pvm__A1_P29166--8pvm__B1_P29166 8pvm \n", "23 test 7yo8__A1_P60520--7yo8__A2_P60520 7yo8 \n", "24 test 7y51__A1_Q8RBF4--7y51__A2_Q8RBF4 7y51 \n", "25 test 7tvh__B1_Q9I2Q1--7tvh__A1_Q9I2Q1 7tvh \n", "26 test 8bwv__D2_A0A482M8M0--8bwv__A1_A0A482M8M0 8bwv \n", "27 test 7t91__A1_P08151--7t91__B1_P08151 7t91 \n", "28 test 7zoo__B1_A0A979GQH9--7zoo__A1_A0A979GQH9 7zoo \n", "29 test 7x4b__A1_A0A2D0TCG3--7x4b__B1_A0A2D0TCG3 7x4b \n", "\n", " cluster_id cluster_id_R cluster_id_L pinder_s pinder_xl \\\n", "0 cluster_31523_31523 cluster_31523 cluster_31523 False True \n", "1 cluster_1209_1209 cluster_1209 cluster_1209 False True \n", "2 cluster_26441_26441 cluster_26441 cluster_26441 False True \n", "3 cluster_22347_22347 cluster_22347 cluster_22347 False True \n", "4 cluster_34280_34280 cluster_34280 cluster_34280 False True \n", "5 cluster_7371_7371 cluster_7371 cluster_7371 True True \n", "6 cluster_91_91 cluster_91 cluster_91 False True \n", "7 cluster_274_274 cluster_274 cluster_274 False True \n", "8 cluster_13298_13298 cluster_13298 cluster_13298 False True \n", "9 cluster_5358_5358 cluster_5358 cluster_5358 False True \n", "10 cluster_12107_26846 cluster_12107 cluster_26846 False True \n", "11 cluster_7939_7939 cluster_7939 cluster_7939 False True \n", "12 cluster_14951_14951 cluster_14951 cluster_14951 False True \n", "13 cluster_5362_5362 cluster_5362 cluster_5362 False True \n", "14 cluster_609_609 cluster_609 cluster_609 False True \n", "15 cluster_5115_5115 cluster_5115 cluster_5115 False True \n", "16 cluster_2975_2975 cluster_2975 cluster_2975 True True \n", "17 cluster_29712_29712 cluster_29712 cluster_29712 False True \n", "18 cluster_11087_12465 cluster_12465 cluster_11087 True True \n", "19 cluster_1537_1537 cluster_1537 cluster_1537 False True \n", "20 cluster_1035_1035 cluster_1035 cluster_1035 False True \n", "21 cluster_19439_19439 cluster_19439 cluster_19439 True True \n", "22 cluster_6440_6440 cluster_6440 cluster_6440 False True \n", "23 cluster_1022_1022 cluster_1022 cluster_1022 False True \n", "24 cluster_711_711 cluster_711 cluster_711 False True \n", "25 cluster_8106_8106 cluster_8106 cluster_8106 False True \n", "26 cluster_27613_27613 cluster_27613 cluster_27613 False True \n", "27 cluster_1801_1801 cluster_1801 cluster_1801 False True \n", "28 cluster_32633_32633 cluster_32633 cluster_32633 False True \n", "29 cluster_7334_7334 cluster_7334 cluster_7334 False True \n", "\n", " pinder_af2 uniprot_R ... apo_L apo_R_quality apo_L_quality chain1_neff \\\n", "0 True P03047 ... True high high 81.750000 \n", "1 True P55265 ... True high high 174.250000 \n", "2 True P0DV83 ... True high high 6.925781 \n", "3 True Q66578 ... True high high 13.382812 \n", "4 True Q64331 ... True high high 444.750000 \n", "5 True Q288C4 ... True high high 169.500000 \n", "6 True A0A1S4NYF2 ... True high high 10.632812 \n", "7 True P00698 ... True high high 254.375000 \n", "8 True Q9Y3D6 ... True high high 132.875000 \n", "9 True Q58241 ... True high high 171.625000 \n", "10 True P06971 ... True high high 288.000000 \n", "11 True P0A7E1 ... True high high 735.500000 \n", "12 True O93732 ... True high high 103.187500 \n", "13 True A0A0H3LM39 ... True high high 533.000000 \n", "14 True Q68T42 ... True high high 45.000000 \n", "15 True Q5SH57 ... True high high 2.939453 \n", "16 True P28907 ... True high high 48.312500 \n", "17 True Q8GPI4 ... True high high 11.882812 \n", "18 True O34841 ... True high high 9.031250 \n", "19 True Q00987 ... True high high 350.750000 \n", "20 True P02945 ... True high high 267.250000 \n", "21 True Q9BYM8 ... True high high 266.750000 \n", "22 True P29166 ... True high high 392.500000 \n", "23 True P60520 ... True high high 385.500000 \n", "24 True Q8RBF4 ... True high high 954.500000 \n", "25 True Q9I2Q1 ... True high high 209.125000 \n", "26 True A0A482M8M0 ... True high high 395.000000 \n", "27 True P08151 ... True high high 321.250000 \n", "28 True A0A979GQH9 ... True high high 419.250000 \n", "29 True A0A2D0TCG3 ... True high high 9.453125 \n", "\n", " chain2_neff chain_R chain_L contains_antibody contains_antigen \\\n", "0 81.750000 B1 A1 False False \n", "1 174.250000 B1 A1 False False \n", "2 6.925781 A1 A2 False False \n", "3 13.382812 A1 B1 False False \n", "4 444.750000 A1 A2 False False \n", "5 169.500000 A1 B1 False False \n", "6 10.632812 A1 C1 False False \n", "7 254.375000 A1 A2 False False \n", "8 132.875000 B1 A1 False False \n", "9 171.625000 B1 A1 False False \n", "10 2.734375 A1 B1 False False \n", "11 735.500000 A1 B1 False False \n", "12 103.187500 A1 B1 False False \n", "13 533.000000 A1 A2 False False \n", "14 45.000000 A1 B1 False False \n", "15 2.939453 B1 A1 False False \n", "16 48.312500 A1 A2 False False \n", "17 11.882812 A1 A2 False False \n", "18 865.000000 A1 B1 False False \n", "19 350.750000 A1 A2 False False \n", "20 267.250000 A1 A2 False False \n", "21 266.750000 B1 A1 False False \n", "22 392.500000 A1 B1 False False \n", "23 385.500000 A1 A2 False False \n", "24 954.500000 A1 A2 False False \n", "25 209.125000 B1 A1 False False \n", "26 395.000000 D2 A1 False False \n", "27 321.250000 A1 B1 False False \n", "28 419.250000 B1 A1 False False \n", "29 9.453125 A1 B1 False False \n", "\n", " contains_enzyme \n", "0 False \n", "1 True \n", "2 False \n", "3 False \n", "4 False \n", "5 False \n", "6 False \n", "7 True \n", "8 False \n", "9 False \n", "10 False \n", "11 True \n", "12 True \n", "13 False \n", "14 True \n", "15 False \n", "16 True \n", "17 False \n", "18 True \n", "19 True \n", "20 False \n", "21 True \n", "22 True \n", "23 False \n", "24 False \n", "25 True \n", "26 False \n", "27 False \n", "28 False \n", "29 False \n", "\n", "[30 rows x 34 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Example: I want all of pinder_af2 - apo\n", "\n", "af2_apo = index.query(\n", " 'pinder_af2 == True and apo_R and apo_L'\n", ").reset_index(drop=True)\n", "af2_apo\n" ] }, { "cell_type": "markdown", "id": "6b19ee41-8e9e-4657-8f39-77fdfc42becb", "metadata": {}, "source": [ "### Finding the existing local filepaths for systems (without re-writing them per system)" ] }, { "cell_type": "code", "execution_count": 6, "id": "aa97b738-8a64-4e43-b1dc-17b226c470fb", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " 0%| | 0/30 [00:00 does not exist! Verify that the provided GCS path points to a valid file!\n", " 43%|█████████████████████████████████████████████████████████████████████████████████████▊ | 13/30 [00:03<00:03, 5.51it/s]2024-06-20 10:52:40,671 | pinder.core.utils.cloud:368 | INFO : Gsutil process_many=download_to_filename, threads=2, items=2\n", "2024-06-20 10:52:40,837 | pinder.core.utils.cloud.process_many:27 | ERROR : runtime failed: 0.17s\n", "2024-06-20 10:52:40,838 | pinder.core.loader.loader:35 | ERROR : Requested storage blob <2024-02/pdbs/af__Q68T42.pdb> does not exist! Verify that the provided GCS path points to a valid file!\n", "100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:04<00:00, 6.25it/s]\n" ] } ], "source": [ "from pinder.core import get_systems\n", "\n", "\n", "local_paths = {}\n", "for system in get_systems(list(af2_apo.id)):\n", " local_paths[system.entry.id] = system.filepaths\n", " " ] }, { "cell_type": "markdown", "id": "5e2c7ce8-f9a6-4cf1-80bf-ebcab410baf1", "metadata": {}, "source": [ "### It should be cached, no need to download if you re-run" ] }, { "cell_type": "code", "execution_count": 7, "id": "7ade9b56-8489-43de-9554-18229a0651c7", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " 0%| | 0/30 [00:00 does not exist! Verify that the provided GCS path points to a valid file!\n", " 43%|█████████████████████████████████████████████████████████████████████████████████████▊ | 13/30 [00:01<00:02, 7.32it/s]2024-06-20 10:52:44,010 | pinder.core.utils.cloud:368 | INFO : Gsutil process_many=download_to_filename, threads=2, items=2\n", "2024-06-20 10:52:44,105 | pinder.core.utils.cloud.process_many:27 | ERROR : runtime failed: 0.10s\n", "2024-06-20 10:52:44,106 | pinder.core.loader.loader:35 | ERROR : Requested storage blob <2024-02/pdbs/af__Q68T42.pdb> does not exist! Verify that the provided GCS path points to a valid file!\n", "100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:03<00:00, 9.55it/s]\n" ] }, { "data": { "text/plain": [ "{'7zj1__B1_P55265--7zj1__A1_P55265': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7zj1__B1_P55265-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7zj1__A1_P55265-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/2mdr__A1_P55265.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/2mdr__A1_P55265.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__P55265.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__P55265.pdb')},\n", " '7zjc__A1_P0DV83--7zjc__A2_P0DV83': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7zjc__A1_P0DV83-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7zjc__A2_P0DV83-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/7qdv__A1_P0DV83.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/7qdv__A1_P0DV83.pdb'),\n", " 'pred_receptor': None,\n", " 'pred_ligand': None},\n", " '7ztw__A1_Q66578--7ztw__B1_Q66578': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7ztw__A1_Q66578-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7ztw__B1_Q66578-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/7zu3__A1_Q66578.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/7zu4__A1_Q66578.pdb'),\n", " 'pred_receptor': None,\n", " 'pred_ligand': None},\n", " '8ard__A1_Q64331--8ard__A2_Q64331': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/8ard__A1_Q64331-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/8ard__A2_Q64331-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/2ld3__A1_Q64331.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/2ld3__A1_Q64331.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q64331.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q64331.pdb')},\n", " '7wbt__A1_Q288C4--7wbt__B1_Q288C4': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7wbt__A1_Q288C4-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7wbt__B1_Q288C4-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/7wbu__A1_Q288C4.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/7wbu__A1_Q288C4.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q288C4.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q288C4.pdb')},\n", " '7z7o__A1_A0A1S4NYF2--7z7o__C1_A0A1S4NYF2': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7z7o__A1_A0A1S4NYF2-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7z7o__C1_A0A1S4NYF2-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/7z7p__A1_A0A1S4NYF2.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/7z7p__A1_A0A1S4NYF2.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__A0A1S4NYF2.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__A0A1S4NYF2.pdb')},\n", " '8pte__A1_P00698--8pte__A2_P00698': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/8pte__A1_P00698-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/8pte__A2_P00698-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/2q0m__A1_P00698.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/2q0m__A1_P00698.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__P00698.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__P00698.pdb')},\n", " '7yka__B1_Q9Y3D6--7yka__A1_Q9Y3D6': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7yka__B1_Q9Y3D6-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7yka__A1_Q9Y3D6-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/1nzn__A1_Q9Y3D6.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/1nzn__A1_Q9Y3D6.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q9Y3D6.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q9Y3D6.pdb')},\n", " '7ykv__B1_Q58241--7ykv__A1_Q58241': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7ykv__B1_Q58241-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7ykv__A1_Q58241-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/6tvv__A1_Q58241.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/6tvv__A1_Q58241.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q58241.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q58241.pdb')},\n", " '8a60__A1_P06971--8a60__B1_Q38162': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/8a60__A1_P06971-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/8a60__B1_Q38162-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/2fcp__A1_P06971.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/7qjf__A1_Q38162.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__P06971.pdb'),\n", " 'pred_ligand': None},\n", " '7t5y__A1_P0A7E1--7t5y__B1_P0A7E1': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7t5y__A1_P0A7E1-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7t5y__B1_P0A7E1-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/1f76__A1_P0A7E1.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/1f76__A1_P0A7E1.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__P0A7E1.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__P0A7E1.pdb')},\n", " '8oru__A1_O93732--8oru__B1_O93732': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/8oru__A1_O93732-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/8oru__B1_O93732-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/8ork__A1_O93732.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/8ork__A1_O93732.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__O93732.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__O93732.pdb')},\n", " '7z6m__A1_A0A0H3LM39--7z6m__A2_A0A0H3LM39': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7z6m__A1_A0A0H3LM39-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7z6m__A2_A0A0H3LM39-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/5tsb__A1_A0A0H3LM39.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/5tsb__A1_A0A0H3LM39.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__A0A0H3LM39.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__A0A0H3LM39.pdb')},\n", " '7wwo__B1_Q5SH57--7wwo__A1_Q5SH57': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7wwo__B1_Q5SH57-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7wwo__A1_Q5SH57-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/7wrk__A1_Q5SH57.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/7wrk__A1_Q5SH57.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q5SH57.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q5SH57.pdb')},\n", " '8d0m__A1_P28907--8d0m__A2_P28907': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/8d0m__A1_P28907-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/8d0m__A2_P28907-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/8p8c__A1_P28907.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/8p8c__A1_P28907.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__P28907.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__P28907.pdb')},\n", " '8avu__A1_Q8GPI4--8avu__A2_Q8GPI4': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/8avu__A1_Q8GPI4-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/8avu__A2_Q8GPI4-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/8avs__A1_Q8GPI4.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/8avs__A1_Q8GPI4.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q8GPI4.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q8GPI4.pdb')},\n", " '8i2e__A1_O34841--8i2e__B1_P54421': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/8i2e__A1_O34841-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/8i2e__B1_P54421-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/2rsx__A1_O34841.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/8i2d__A1_P54421.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__O34841.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__P54421.pdb')},\n", " '8aeu__A1_Q00987--8aeu__A2_Q00987': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/8aeu__A1_Q00987-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/8aeu__A2_Q00987-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/6kzu__A1_Q00987.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/6kzu__A1_Q00987.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q00987.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q00987.pdb')},\n", " '7vso__A1_P02945--7vso__A2_P02945': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7vso__A1_P02945-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7vso__A2_P02945-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/5b34__A1_P02945.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/5b34__A1_P02945.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__P02945.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__P02945.pdb')},\n", " '7yuj__B1_Q9BYM8--7yuj__A1_Q9BYM8': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7yuj__B1_Q9BYM8-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7yuj__A1_Q9BYM8-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/8bvl__A1_Q9BYM8.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/8bvl__A1_Q9BYM8.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q9BYM8.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q9BYM8.pdb')},\n", " '8pvm__A1_P29166--8pvm__B1_P29166': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/8pvm__A1_P29166-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/8pvm__B1_P29166-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/8aio__A1_P29166.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/5byq__A1_P29166.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__P29166.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__P29166.pdb')},\n", " '7yo8__A1_P60520--7yo8__A2_P60520': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7yo8__A1_P60520-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7yo8__A2_P60520-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/7lk3__A1_P60520.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/7lk3__A1_P60520.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__P60520.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__P60520.pdb')},\n", " '7y51__A1_Q8RBF4--7y51__A2_Q8RBF4': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7y51__A1_Q8RBF4-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7y51__A2_Q8RBF4-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/7fbw__A1_Q8RBF4.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/7fbw__A1_Q8RBF4.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q8RBF4.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q8RBF4.pdb')},\n", " '7tvh__B1_Q9I2Q1--7tvh__A1_Q9I2Q1': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7tvh__B1_Q9I2Q1-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7tvh__A1_Q9I2Q1-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/4fgd__A1_Q9I2Q1.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/4f4m__A1_Q9I2Q1.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q9I2Q1.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q9I2Q1.pdb')},\n", " '8bwv__D2_A0A482M8M0--8bwv__A1_A0A482M8M0': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/8bwv__D2_A0A482M8M0-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/8bwv__A1_A0A482M8M0-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/8bbz__A1_A0A482M8M0.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/8bhu__A1_A0A482M8M0.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__A0A482M8M0.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__A0A482M8M0.pdb')},\n", " '7t91__A1_P08151--7t91__B1_P08151': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7t91__A1_P08151-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7t91__B1_P08151-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/2gli__C1_P08151.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/2gli__C1_P08151.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__P08151.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__P08151.pdb')},\n", " '7zoo__B1_A0A979GQH9--7zoo__A1_A0A979GQH9': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7zoo__B1_A0A979GQH9-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7zoo__A1_A0A979GQH9-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/7zoh__A1_A0A979GQH9.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/7zoh__A1_A0A979GQH9.pdb'),\n", " 'pred_receptor': None,\n", " 'pred_ligand': None},\n", " '7x4b__A1_A0A2D0TCG3--7x4b__B1_A0A2D0TCG3': {'holo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7x4b__A1_A0A2D0TCG3-R.pdb'),\n", " 'holo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/test_set_pdbs/7x4b__B1_A0A2D0TCG3-L.pdb'),\n", " 'apo_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/7x31__A1_A0A2D0TCG3.pdb'),\n", " 'apo_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/7x31__A1_A0A2D0TCG3.pdb'),\n", " 'pred_receptor': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__A0A2D0TCG3.pdb'),\n", " 'pred_ligand': PosixPath('/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__A0A2D0TCG3.pdb')}}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "local_paths = {}\n", "for system in get_systems(list(af2_apo.id)):\n", " local_paths[system.entry.id] = system.filepaths\n", "local_paths " ] }, { "cell_type": "markdown", "id": "4a5dfd80-5427-4bd9-8433-fae50870f72a", "metadata": {}, "source": [ "### Pinder filters" ] }, { "cell_type": "code", "execution_count": 8, "id": "57c615bb-711c-49c4-822f-750f3349c715", "metadata": {}, "outputs": [ { "data": { "text/plain": [ " at 0x1c91e9970>" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pinder.core.loader import filters\n", "\n", "pinder_id = \"1df0__A1_Q07009--1df0__B1_Q64537\"\n", "dimer = PinderSystem(entry=pinder_id)\n", "\n", "base_filters = [\n", " filters.FilterByMissingHolo(),\n", " filters.FilterSubByContacts(min_contacts=5, radius=10.0, calpha_only=True),\n", " filters.FilterByHoloElongation(max_var_contribution=0.92),\n", " filters.FilterDetachedHolo(radius=12, max_components=2),\n", "]\n", "sub_filters = [\n", " filters.FilterSubByAtomTypes(min_atom_types=4),\n", " filters.FilterByHoloOverlap(min_overlap=5),\n", " filters.FilterByHoloSeqIdentity(min_sequence_identity=0.8),\n", " filters.FilterSubLengths(min_length=0, max_length=1000),\n", " filters.FilterSubRmsds(rmsd_cutoff=7.5),\n", " filters.FilterByElongation(max_var_contribution=0.92),\n", " filters.FilterDetachedSub(radius=12, max_components=2),\n", "]\n", "dimers = [dimer]\n", "for sub_filter in sub_filters:\n", " dimers = (sub_filter(dimer) for dimer in dimers)\n", "\n", "for base_filter in base_filters:\n", " dimers = (dimer for dimer in dimers if base_filter(dimer))\n", "\n", "dimers" ] }, { "cell_type": "code", "execution_count": 9, "id": "22ed5dad-f294-4fc2-8792-39bde4c885d6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "PinderSystem(\n", "entry = IndexEntry(\n", " (\n", " 'split',\n", " 'invalid',\n", " ),\n", " (\n", " 'id',\n", " '1df0__A1_Q07009--1df0__B1_Q64537',\n", " ),\n", " (\n", " 'pdb_id',\n", " '1df0',\n", " ),\n", " (\n", " 'cluster_id',\n", " 'cluster_1030_1030',\n", " ),\n", " (\n", " 'cluster_id_R',\n", " 'cluster_1030',\n", " ),\n", " (\n", " 'cluster_id_L',\n", " 'cluster_1030',\n", " ),\n", " (\n", " 'pinder_s',\n", " False,\n", " ),\n", " (\n", " 'pinder_xl',\n", " False,\n", " ),\n", " (\n", " 'pinder_af2',\n", " False,\n", " ),\n", " (\n", " 'uniprot_R',\n", " 'Q07009',\n", " ),\n", " (\n", " 'uniprot_L',\n", " 'Q64537',\n", " ),\n", " (\n", " 'holo_R_pdb',\n", " '1df0__A1_Q07009-R.pdb',\n", " ),\n", " (\n", " 'holo_L_pdb',\n", " '1df0__B1_Q64537-L.pdb',\n", " ),\n", " (\n", " 'predicted_R_pdb',\n", " 'af__Q07009.pdb',\n", " ),\n", " (\n", " 'predicted_L_pdb',\n", " 'af__Q64537.pdb',\n", " ),\n", " (\n", " 'apo_R_pdb',\n", " '',\n", " ),\n", " (\n", " 'apo_L_pdb',\n", " '',\n", " ),\n", " (\n", " 'apo_R_pdbs',\n", " '',\n", " ),\n", " (\n", " 'apo_L_pdbs',\n", " '',\n", " ),\n", " (\n", " 'holo_R',\n", " True,\n", " ),\n", " (\n", " 'holo_L',\n", " True,\n", " ),\n", " (\n", " 'predicted_R',\n", " True,\n", " ),\n", " (\n", " 'predicted_L',\n", " True,\n", " ),\n", " (\n", " 'apo_R',\n", " False,\n", " ),\n", " (\n", " 'apo_L',\n", " False,\n", " ),\n", " (\n", " 'apo_R_quality',\n", " '',\n", " ),\n", " (\n", " 'apo_L_quality',\n", " '',\n", " ),\n", " (\n", " 'chain1_neff',\n", " 492.25,\n", " ),\n", " (\n", " 'chain2_neff',\n", " 528.0,\n", " ),\n", " (\n", " 'chain_R',\n", " 'A1',\n", " ),\n", " (\n", " 'chain_L',\n", " 'B1',\n", " ),\n", " (\n", " 'contains_antibody',\n", " False,\n", " ),\n", " (\n", " 'contains_antigen',\n", " False,\n", " ),\n", " (\n", " 'contains_enzyme',\n", " True,\n", " ),\n", ")\n", "native=Structure(\n", " filepath=/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/1df0__A1_Q07009--1df0__B1_Q64537.pdb,\n", " uniprot_map=None,\n", " pinder_id='1df0__A1_Q07009--1df0__B1_Q64537',\n", " atom_array= with shape (6391,),\n", ")\n", "holo_receptor=Structure(\n", " filepath=/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/1df0__A1_Q07009-R.pdb,\n", " uniprot_map=/Users/danielkovtun/.local/share/pinder/2024-02/mappings/1df0__A1_Q07009-R.parquet,\n", " pinder_id='1df0__A1_Q07009-R',\n", " atom_array= with shape (4964,),\n", ")\n", "holo_ligand=Structure(\n", " filepath=/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/1df0__B1_Q64537-L.pdb,\n", " uniprot_map=/Users/danielkovtun/.local/share/pinder/2024-02/mappings/1df0__B1_Q64537-L.parquet,\n", " pinder_id='1df0__B1_Q64537-L',\n", " atom_array= with shape (1427,),\n", ")\n", "apo_receptor=None\n", "apo_ligand=None\n", "pred_receptor=Structure(\n", " filepath=/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q07009.pdb,\n", " uniprot_map=None,\n", " pinder_id='af__Q07009',\n", " atom_array= with shape (5631,),\n", ")\n", "pred_ligand=Structure(\n", " filepath=/Users/danielkovtun/.local/share/pinder/2024-02/pdbs/af__Q64537.pdb,\n", " uniprot_map=None,\n", " pinder_id='af__Q64537',\n", " atom_array= with shape (2006,),\n", ")\n", ")" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(dimers)[0]" ] } ], "metadata": { "kernelspec": { "display_name": "pinder", "language": "python", "name": "pinder" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.18" } }, "nbformat": 4, "nbformat_minor": 5 }