{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "08afcefc",
   "metadata": {},
   "source": [
    "# Классификация астероидов прототипной нейросетью\n",
    "\n",
    "В работе автоматически определяется ключевая зависимая переменная, выполняется предобработка данных и строится **прототипная нейросеть типа RBF**:\n",
    "- для каждого класса выделяются прототипы методом `MiniBatchKMeans`;\n",
    "- расстояния до прототипов преобразуются в радиальные отклики;\n",
    "- выходной слой обучается логистической регрессией по откликам прототипов.\n",
    "\n",
    "Далее вычисляются:\n",
    "- метрики качества на train / validation / test;\n",
    "- глобальная важность исходных признаков методом перестановок;\n",
    "- основные прототипы, их опорные объекты и вклады признаков внутри прототипов.\n",
    "\n",
    "Финальный текстовый вывод в ноутбуке **не приводится**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "9b6fb9eb",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import re\n",
    "import glob\n",
    "import warnings\n",
    "from pathlib import Path\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.compose import ColumnTransformer\n",
    "from sklearn.pipeline import Pipeline\n",
    "from sklearn.impute import SimpleImputer\n",
    "from sklearn.preprocessing import OneHotEncoder, StandardScaler\n",
    "from sklearn.cluster import MiniBatchKMeans\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.metrics import (\n",
    "    accuracy_score,\n",
    "    balanced_accuracy_score,\n",
    "    precision_score,\n",
    "    recall_score,\n",
    "    f1_score,\n",
    "    roc_auc_score,\n",
    "    confusion_matrix,\n",
    "    classification_report,\n",
    ")\n",
    "from sklearn.metrics import pairwise_distances, pairwise_distances_argmin_min\n",
    "\n",
    "warnings.filterwarnings(\"ignore\")\n",
    "\n",
    "SEED = 42\n",
    "np.random.seed(SEED)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "dd08087c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Используемый файл: Asteroid_Updated.csv\n",
      "Форма датасета: (839714, 31)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>a</th>\n",
       "      <th>e</th>\n",
       "      <th>i</th>\n",
       "      <th>om</th>\n",
       "      <th>w</th>\n",
       "      <th>q</th>\n",
       "      <th>ad</th>\n",
       "      <th>per_y</th>\n",
       "      <th>data_arc</th>\n",
       "      <th>...</th>\n",
       "      <th>UB</th>\n",
       "      <th>IR</th>\n",
       "      <th>spec_B</th>\n",
       "      <th>spec_T</th>\n",
       "      <th>G</th>\n",
       "      <th>moid</th>\n",
       "      <th>class</th>\n",
       "      <th>n</th>\n",
       "      <th>per</th>\n",
       "      <th>ma</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ceres</td>\n",
       "      <td>2.769165</td>\n",
       "      <td>0.076009</td>\n",
       "      <td>10.594067</td>\n",
       "      <td>80.305532</td>\n",
       "      <td>73.597694</td>\n",
       "      <td>2.558684</td>\n",
       "      <td>2.979647</td>\n",
       "      <td>4.608202</td>\n",
       "      <td>8822.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.426</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "      <td>G</td>\n",
       "      <td>0.12</td>\n",
       "      <td>1.59478</td>\n",
       "      <td>MBA</td>\n",
       "      <td>0.213885</td>\n",
       "      <td>1683.145708</td>\n",
       "      <td>77.372096</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Pallas</td>\n",
       "      <td>2.772466</td>\n",
       "      <td>0.230337</td>\n",
       "      <td>34.836234</td>\n",
       "      <td>173.080063</td>\n",
       "      <td>310.048857</td>\n",
       "      <td>2.133865</td>\n",
       "      <td>3.411067</td>\n",
       "      <td>4.616444</td>\n",
       "      <td>72318.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.284</td>\n",
       "      <td>NaN</td>\n",
       "      <td>B</td>\n",
       "      <td>B</td>\n",
       "      <td>0.11</td>\n",
       "      <td>1.23324</td>\n",
       "      <td>MBA</td>\n",
       "      <td>0.213503</td>\n",
       "      <td>1686.155999</td>\n",
       "      <td>59.699133</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Juno</td>\n",
       "      <td>2.669150</td>\n",
       "      <td>0.256942</td>\n",
       "      <td>12.988919</td>\n",
       "      <td>169.852760</td>\n",
       "      <td>248.138626</td>\n",
       "      <td>1.983332</td>\n",
       "      <td>3.354967</td>\n",
       "      <td>4.360814</td>\n",
       "      <td>72684.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.433</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Sk</td>\n",
       "      <td>S</td>\n",
       "      <td>0.32</td>\n",
       "      <td>1.03454</td>\n",
       "      <td>MBA</td>\n",
       "      <td>0.226019</td>\n",
       "      <td>1592.787285</td>\n",
       "      <td>34.925016</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Vesta</td>\n",
       "      <td>2.361418</td>\n",
       "      <td>0.088721</td>\n",
       "      <td>7.141771</td>\n",
       "      <td>103.810804</td>\n",
       "      <td>150.728541</td>\n",
       "      <td>2.151909</td>\n",
       "      <td>2.570926</td>\n",
       "      <td>3.628837</td>\n",
       "      <td>24288.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.492</td>\n",
       "      <td>NaN</td>\n",
       "      <td>V</td>\n",
       "      <td>V</td>\n",
       "      <td>0.32</td>\n",
       "      <td>1.13948</td>\n",
       "      <td>MBA</td>\n",
       "      <td>0.271609</td>\n",
       "      <td>1325.432765</td>\n",
       "      <td>95.861936</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Astraea</td>\n",
       "      <td>2.574249</td>\n",
       "      <td>0.191095</td>\n",
       "      <td>5.366988</td>\n",
       "      <td>141.576605</td>\n",
       "      <td>358.687607</td>\n",
       "      <td>2.082324</td>\n",
       "      <td>3.066174</td>\n",
       "      <td>4.130323</td>\n",
       "      <td>63507.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.411</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "      <td>S</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.09589</td>\n",
       "      <td>MBA</td>\n",
       "      <td>0.238632</td>\n",
       "      <td>1508.600458</td>\n",
       "      <td>282.366289</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 31 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      name         a         e          i          om           w         q  \\\n",
       "0    Ceres  2.769165  0.076009  10.594067   80.305532   73.597694  2.558684   \n",
       "1   Pallas  2.772466  0.230337  34.836234  173.080063  310.048857  2.133865   \n",
       "2     Juno  2.669150  0.256942  12.988919  169.852760  248.138626  1.983332   \n",
       "3    Vesta  2.361418  0.088721   7.141771  103.810804  150.728541  2.151909   \n",
       "4  Astraea  2.574249  0.191095   5.366988  141.576605  358.687607  2.082324   \n",
       "\n",
       "         ad     per_y  data_arc  ...     UB  IR  spec_B spec_T     G     moid  \\\n",
       "0  2.979647  4.608202    8822.0  ...  0.426 NaN       C      G  0.12  1.59478   \n",
       "1  3.411067  4.616444   72318.0  ...  0.284 NaN       B      B  0.11  1.23324   \n",
       "2  3.354967  4.360814   72684.0  ...  0.433 NaN      Sk      S  0.32  1.03454   \n",
       "3  2.570926  3.628837   24288.0  ...  0.492 NaN       V      V  0.32  1.13948   \n",
       "4  3.066174  4.130323   63507.0  ...  0.411 NaN       S      S   NaN  1.09589   \n",
       "\n",
       "  class         n          per          ma  \n",
       "0   MBA  0.213885  1683.145708   77.372096  \n",
       "1   MBA  0.213503  1686.155999   59.699133  \n",
       "2   MBA  0.226019  1592.787285   34.925016  \n",
       "3   MBA  0.271609  1325.432765   95.861936  \n",
       "4   MBA  0.238632  1508.600458  282.366289  \n",
       "\n",
       "[5 rows x 31 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Поиск CSV-файла\n",
    "candidate_files = [\n",
    "    \"Asteroid_Updated.csv\",\n",
    "    \"asteroid_updated.csv\",\n",
    "    \"Asteroid Updated.csv\",\n",
    "    \"asteroid.csv\",\n",
    "    \"asteroid_dataset.csv\",\n",
    "    \"Asteroid_Dataset.csv\"\n",
    "]\n",
    "\n",
    "csv_path = None\n",
    "for cand in candidate_files:\n",
    "    if Path(cand).exists():\n",
    "        csv_path = cand\n",
    "        break\n",
    "\n",
    "if csv_path is None:\n",
    "    matches = glob.glob(\"*.csv\")\n",
    "    if len(matches) == 1:\n",
    "        csv_path = matches[0]\n",
    "    elif len(matches) > 1:\n",
    "        # стараемся выбрать наиболее подходящий по имени\n",
    "        scored = []\n",
    "        for m in matches:\n",
    "            score = 0\n",
    "            low = m.lower()\n",
    "            if \"asteroid\" in low:\n",
    "                score += 5\n",
    "            if \"updated\" in low:\n",
    "                score += 2\n",
    "            scored.append((score, m))\n",
    "        scored = sorted(scored, reverse=True)\n",
    "        csv_path = scored[0][1]\n",
    "\n",
    "if csv_path is None:\n",
    "    raise FileNotFoundError(\"CSV-файл не найден. Поместите файл Asteroid_Updated.csv в ту же папку, что и ноутбук.\")\n",
    "\n",
    "print(\"Используемый файл:\", csv_path)\n",
    "df_raw = pd.read_csv(csv_path)\n",
    "print(\"Форма датасета:\", df_raw.shape)\n",
    "display(df_raw.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "8a8386bf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>original_name</th>\n",
       "      <th>normalized_name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>name</td>\n",
       "      <td>name</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>e</td>\n",
       "      <td>e</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>i</td>\n",
       "      <td>i</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>om</td>\n",
       "      <td>om</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>w</td>\n",
       "      <td>w</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>q</td>\n",
       "      <td>q</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>ad</td>\n",
       "      <td>ad</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>per_y</td>\n",
       "      <td>per_y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>data_arc</td>\n",
       "      <td>data_arc</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>condition_code</td>\n",
       "      <td>condition_code</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>n_obs_used</td>\n",
       "      <td>n_obs_used</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>H</td>\n",
       "      <td>h</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>neo</td>\n",
       "      <td>neo</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>pha</td>\n",
       "      <td>pha</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>diameter</td>\n",
       "      <td>diameter</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>extent</td>\n",
       "      <td>extent</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>albedo</td>\n",
       "      <td>albedo</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>rot_per</td>\n",
       "      <td>rot_per</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>GM</td>\n",
       "      <td>gm</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>BV</td>\n",
       "      <td>bv</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>UB</td>\n",
       "      <td>ub</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>IR</td>\n",
       "      <td>ir</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>spec_B</td>\n",
       "      <td>spec_b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>spec_T</td>\n",
       "      <td>spec_t</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>G</td>\n",
       "      <td>g</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>moid</td>\n",
       "      <td>moid</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>class</td>\n",
       "      <td>class</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>n</td>\n",
       "      <td>n</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>per</td>\n",
       "      <td>per</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>ma</td>\n",
       "      <td>ma</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     original_name normalized_name\n",
       "0             name            name\n",
       "1                a               a\n",
       "2                e               e\n",
       "3                i               i\n",
       "4               om              om\n",
       "5                w               w\n",
       "6                q               q\n",
       "7               ad              ad\n",
       "8            per_y           per_y\n",
       "9         data_arc        data_arc\n",
       "10  condition_code  condition_code\n",
       "11      n_obs_used      n_obs_used\n",
       "12               H               h\n",
       "13             neo             neo\n",
       "14             pha             pha\n",
       "15        diameter        diameter\n",
       "16          extent          extent\n",
       "17          albedo          albedo\n",
       "18         rot_per         rot_per\n",
       "19              GM              gm\n",
       "20              BV              bv\n",
       "21              UB              ub\n",
       "22              IR              ir\n",
       "23          spec_B          spec_b\n",
       "24          spec_T          spec_t\n",
       "25               G               g\n",
       "26            moid            moid\n",
       "27           class           class\n",
       "28               n               n\n",
       "29             per             per\n",
       "30              ma              ma"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Типы данных после нормализации:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>a</th>\n",
       "      <th>e</th>\n",
       "      <th>i</th>\n",
       "      <th>om</th>\n",
       "      <th>w</th>\n",
       "      <th>q</th>\n",
       "      <th>ad</th>\n",
       "      <th>per_y</th>\n",
       "      <th>data_arc</th>\n",
       "      <th>...</th>\n",
       "      <th>ub</th>\n",
       "      <th>ir</th>\n",
       "      <th>spec_b</th>\n",
       "      <th>spec_t</th>\n",
       "      <th>g</th>\n",
       "      <th>moid</th>\n",
       "      <th>class</th>\n",
       "      <th>n</th>\n",
       "      <th>per</th>\n",
       "      <th>ma</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>dtype</th>\n",
       "      <td>object</td>\n",
       "      <td>float64</td>\n",
       "      <td>float64</td>\n",
       "      <td>float64</td>\n",
       "      <td>float64</td>\n",
       "      <td>float64</td>\n",
       "      <td>float64</td>\n",
       "      <td>float64</td>\n",
       "      <td>float64</td>\n",
       "      <td>float64</td>\n",
       "      <td>...</td>\n",
       "      <td>float64</td>\n",
       "      <td>float64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>float64</td>\n",
       "      <td>float64</td>\n",
       "      <td>object</td>\n",
       "      <td>float64</td>\n",
       "      <td>float64</td>\n",
       "      <td>float64</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1 rows × 31 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "         name        a        e        i       om        w        q       ad  \\\n",
       "dtype  object  float64  float64  float64  float64  float64  float64  float64   \n",
       "\n",
       "         per_y data_arc  ...       ub       ir  spec_b  spec_t        g  \\\n",
       "dtype  float64  float64  ...  float64  float64  object  object  float64   \n",
       "\n",
       "          moid   class        n      per       ma  \n",
       "dtype  float64  object  float64  float64  float64  \n",
       "\n",
       "[1 rows x 31 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Нормализация названий столбцов и попытка приведения object-столбцов к числам\n",
    "def normalize_col(col: str) -> str:\n",
    "    col = str(col).strip().lower()\n",
    "    col = col.replace(\"%\", \"pct\")\n",
    "    col = re.sub(r\"[^a-z0-9]+\", \"_\", col)\n",
    "    col = re.sub(r\"_+\", \"_\", col).strip(\"_\")\n",
    "    return col\n",
    "\n",
    "df = df_raw.copy()\n",
    "orig_cols = df.columns.tolist()\n",
    "df.columns = [normalize_col(c) for c in df.columns]\n",
    "\n",
    "col_map = pd.DataFrame({\n",
    "    \"original_name\": orig_cols,\n",
    "    \"normalized_name\": df.columns\n",
    "})\n",
    "display(col_map)\n",
    "\n",
    "for col in df.columns:\n",
    "    if df[col].dtype == \"object\":\n",
    "        cleaned = (\n",
    "            df[col]\n",
    "            .astype(str)\n",
    "            .str.replace(\",\", \"\", regex=False)\n",
    "            .str.strip()\n",
    "            .replace({\"\": np.nan, \"nan\": np.nan, \"none\": np.nan, \"None\": np.nan, \"null\": np.nan})\n",
    "        )\n",
    "        parsed = pd.to_numeric(cleaned, errors=\"coerce\")\n",
    "        share_numeric = parsed.notna().mean()\n",
    "        if share_numeric >= 0.95:\n",
    "            df[col] = parsed\n",
    "\n",
    "print(\"Типы данных после нормализации:\")\n",
    "display(df.dtypes.rename(\"dtype\").to_frame().T)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "1e02249a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Выбранная зависимая переменная: pha\n",
      "Распределение классов:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>share</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>pha</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.997552</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.002448</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        share\n",
       "pha          \n",
       "0    0.997552\n",
       "1    0.002448"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Автоматический выбор зависимой переменной\n",
    "target_candidates = [\n",
    "    \"pha\", \"is_hazardous\", \"hazardous\", \"hazardous_asteroid\",\n",
    "    \"potentially_hazardous\", \"risk\", \"target\"\n",
    "]\n",
    "\n",
    "target_col = None\n",
    "for cand in target_candidates:\n",
    "    if cand in df.columns:\n",
    "        target_col = cand\n",
    "        break\n",
    "\n",
    "if target_col is None:\n",
    "    for col in df.columns:\n",
    "        nun = df[col].nunique(dropna=True)\n",
    "        if nun == 2 and any(key in col for key in [\"haz\", \"risk\", \"target\", \"pha\"]):\n",
    "            target_col = col\n",
    "            break\n",
    "\n",
    "if target_col is None:\n",
    "    raise ValueError(\"Не удалось автоматически определить целевую переменную для бинарной классификации.\")\n",
    "\n",
    "print(\"Выбранная зависимая переменная:\", target_col)\n",
    "\n",
    "target_series = df[target_col].copy()\n",
    "if target_series.dtype == \"object\":\n",
    "    target_clean = (\n",
    "        target_series.astype(str)\n",
    "        .str.strip()\n",
    "        .str.lower()\n",
    "        .replace({\"\": np.nan, \"nan\": np.nan, \"none\": np.nan, \"null\": np.nan})\n",
    "    )\n",
    "    mapping = {\n",
    "        \"y\": 1, \"yes\": 1, \"true\": 1, \"1\": 1, \"t\": 1,\n",
    "        \"n\": 0, \"no\": 0, \"false\": 0, \"0\": 0, \"f\": 0\n",
    "    }\n",
    "    uniq = pd.Series(target_clean.dropna().unique())\n",
    "    if set(uniq.tolist()).issubset(set(mapping.keys())):\n",
    "        y = target_clean.map(mapping)\n",
    "    else:\n",
    "        vals = list(uniq.sort_values())\n",
    "        if len(vals) != 2:\n",
    "            raise ValueError(f\"Целевая переменная {target_col} не является бинарной. Найденные значения: {vals[:10]}\")\n",
    "        y = (target_clean == vals[-1]).astype(float)\n",
    "else:\n",
    "    uniq = sorted(pd.Series(target_series.dropna().unique()).tolist())\n",
    "    if len(uniq) != 2:\n",
    "        raise ValueError(f\"Целевая переменная {target_col} не является бинарной.\")\n",
    "    y = (target_series == uniq[-1]).astype(float)\n",
    "\n",
    "mask = y.notna()\n",
    "df = df.loc[mask].reset_index(drop=True)\n",
    "y = y.loc[mask].astype(int).reset_index(drop=True)\n",
    "\n",
    "print(\"Распределение классов:\")\n",
    "display(y.value_counts(normalize=True).sort_index().rename(\"share\").to_frame())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "faa5136c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Датасет уменьшен до 250000 объектов для устойчивого обучения с сохранением баланса классов.\n"
     ]
    }
   ],
   "source": [
    "# При необходимости уменьшаем объём данных для устойчивого обучения\n",
    "MAX_ROWS_FOR_MODELING = 250_000\n",
    "\n",
    "if len(df) > MAX_ROWS_FOR_MODELING:\n",
    "    tmp = pd.DataFrame({\"target\": y})\n",
    "    sampled_idx = (\n",
    "        tmp.groupby(\"target\", group_keys=False)\n",
    "           .apply(lambda g: g.sample(\n",
    "               n=max(1, int(round(len(g) / len(tmp) * MAX_ROWS_FOR_MODELING))),\n",
    "               random_state=SEED\n",
    "           ))\n",
    "           .index\n",
    "    )\n",
    "    sampled_idx = np.sort(sampled_idx)\n",
    "    df = df.loc[sampled_idx].reset_index(drop=True)\n",
    "    y = y.loc[sampled_idx].reset_index(drop=True)\n",
    "    print(f\"Датасет уменьшен до {len(df)} объектов для устойчивого обучения с сохранением баланса классов.\")\n",
    "else:\n",
    "    print(\"Используется полный датасет.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "4b2d11fb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Удалённые признаки: ['albedo', 'bv', 'diameter', 'extent', 'g', 'gm', 'ir', 'name', 'rot_per', 'spec_b', 'spec_t', 'ub']\n",
      "\n",
      "Число исходных содержательных признаков: 18\n",
      "Числовые признаки: ['a', 'e', 'i', 'om', 'w', 'q', 'ad', 'per_y', 'data_arc', 'condition_code', 'n_obs_used', 'h', 'moid', 'n', 'per', 'ma']\n",
      "Категориальные признаки: ['neo', 'class']\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>e</th>\n",
       "      <th>i</th>\n",
       "      <th>om</th>\n",
       "      <th>w</th>\n",
       "      <th>q</th>\n",
       "      <th>ad</th>\n",
       "      <th>per_y</th>\n",
       "      <th>data_arc</th>\n",
       "      <th>condition_code</th>\n",
       "      <th>n_obs_used</th>\n",
       "      <th>h</th>\n",
       "      <th>moid</th>\n",
       "      <th>n</th>\n",
       "      <th>per</th>\n",
       "      <th>ma</th>\n",
       "      <th>neo</th>\n",
       "      <th>class</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2.769165</td>\n",
       "      <td>0.076009</td>\n",
       "      <td>10.594067</td>\n",
       "      <td>80.305532</td>\n",
       "      <td>73.597694</td>\n",
       "      <td>2.558684</td>\n",
       "      <td>2.979647</td>\n",
       "      <td>4.608202</td>\n",
       "      <td>8822.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1002</td>\n",
       "      <td>3.34</td>\n",
       "      <td>1.594780</td>\n",
       "      <td>0.213885</td>\n",
       "      <td>1683.145708</td>\n",
       "      <td>77.372096</td>\n",
       "      <td>N</td>\n",
       "      <td>MBA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2.385334</td>\n",
       "      <td>0.231206</td>\n",
       "      <td>5.523651</td>\n",
       "      <td>259.563231</td>\n",
       "      <td>145.265106</td>\n",
       "      <td>1.833831</td>\n",
       "      <td>2.936837</td>\n",
       "      <td>3.684105</td>\n",
       "      <td>62452.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>5206</td>\n",
       "      <td>5.51</td>\n",
       "      <td>0.846100</td>\n",
       "      <td>0.267535</td>\n",
       "      <td>1345.619196</td>\n",
       "      <td>140.419656</td>\n",
       "      <td>N</td>\n",
       "      <td>MBA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2.201764</td>\n",
       "      <td>0.156499</td>\n",
       "      <td>5.886955</td>\n",
       "      <td>110.889330</td>\n",
       "      <td>285.287462</td>\n",
       "      <td>1.857190</td>\n",
       "      <td>2.546339</td>\n",
       "      <td>3.267115</td>\n",
       "      <td>62655.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2744</td>\n",
       "      <td>6.49</td>\n",
       "      <td>0.874176</td>\n",
       "      <td>0.301681</td>\n",
       "      <td>1193.313717</td>\n",
       "      <td>194.882895</td>\n",
       "      <td>N</td>\n",
       "      <td>MBA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.141539</td>\n",
       "      <td>0.112461</td>\n",
       "      <td>3.831560</td>\n",
       "      <td>283.202167</td>\n",
       "      <td>312.315206</td>\n",
       "      <td>2.788240</td>\n",
       "      <td>3.494839</td>\n",
       "      <td>5.568291</td>\n",
       "      <td>62175.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>3409</td>\n",
       "      <td>5.43</td>\n",
       "      <td>1.778390</td>\n",
       "      <td>0.177007</td>\n",
       "      <td>2033.818284</td>\n",
       "      <td>152.184851</td>\n",
       "      <td>N</td>\n",
       "      <td>MBA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2.575981</td>\n",
       "      <td>0.085121</td>\n",
       "      <td>16.536125</td>\n",
       "      <td>43.221913</td>\n",
       "      <td>80.544823</td>\n",
       "      <td>2.356710</td>\n",
       "      <td>2.795252</td>\n",
       "      <td>4.134492</td>\n",
       "      <td>61680.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2385</td>\n",
       "      <td>6.74</td>\n",
       "      <td>1.436330</td>\n",
       "      <td>0.238391</td>\n",
       "      <td>1510.123380</td>\n",
       "      <td>187.488522</td>\n",
       "      <td>N</td>\n",
       "      <td>MBA</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          a         e          i          om           w         q        ad  \\\n",
       "0  2.769165  0.076009  10.594067   80.305532   73.597694  2.558684  2.979647   \n",
       "1  2.385334  0.231206   5.523651  259.563231  145.265106  1.833831  2.936837   \n",
       "2  2.201764  0.156499   5.886955  110.889330  285.287462  1.857190  2.546339   \n",
       "3  3.141539  0.112461   3.831560  283.202167  312.315206  2.788240  3.494839   \n",
       "4  2.575981  0.085121  16.536125   43.221913   80.544823  2.356710  2.795252   \n",
       "\n",
       "      per_y  data_arc  condition_code  n_obs_used     h      moid         n  \\\n",
       "0  4.608202    8822.0             0.0        1002  3.34  1.594780  0.213885   \n",
       "1  3.684105   62452.0             0.0        5206  5.51  0.846100  0.267535   \n",
       "2  3.267115   62655.0             0.0        2744  6.49  0.874176  0.301681   \n",
       "3  5.568291   62175.0             0.0        3409  5.43  1.778390  0.177007   \n",
       "4  4.134492   61680.0             0.0        2385  6.74  1.436330  0.238391   \n",
       "\n",
       "           per          ma neo class  \n",
       "0  1683.145708   77.372096   N   MBA  \n",
       "1  1345.619196  140.419656   N   MBA  \n",
       "2  1193.313717  194.882895   N   MBA  \n",
       "3  2033.818284  152.184851   N   MBA  \n",
       "4  1510.123380  187.488522   N   MBA  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Отбор содержательных признаков\n",
    "name_series = df[\"name\"].copy() if \"name\" in df.columns else pd.Series(np.arange(len(df)), name=\"object_name\")\n",
    "\n",
    "X = df.drop(columns=[target_col]).copy()\n",
    "\n",
    "explicit_drop = {\n",
    "    \"name\", \"id\", \"identifier\", \"full_name\", \"spkid\", \"pdes\",\n",
    "    \"orbit_id\", \"extent\"\n",
    "}\n",
    "to_drop = [c for c in X.columns if c in explicit_drop]\n",
    "\n",
    "# Удаляем почти пустые столбцы\n",
    "missing_share = X.isna().mean()\n",
    "to_drop += missing_share[missing_share > 0.80].index.tolist()\n",
    "\n",
    "# Удаляем высококардинальные строковые идентификаторы\n",
    "candidate_keep = {\"neo\", \"class\", \"spec_b\", \"spec_t\"}\n",
    "obj_cols = X.select_dtypes(include=\"object\").columns.tolist()\n",
    "for col in obj_cols:\n",
    "    nun = X[col].nunique(dropna=True)\n",
    "    ratio = nun / max(len(X), 1)\n",
    "    avg_len = X[col].dropna().astype(str).str.len().mean() if X[col].notna().any() else 0\n",
    "    if col in candidate_keep:\n",
    "        continue\n",
    "    if nun > 50 or ratio > 0.10 or avg_len > 25:\n",
    "        to_drop.append(col)\n",
    "\n",
    "to_drop = sorted(set([c for c in to_drop if c in X.columns]))\n",
    "X = X.drop(columns=to_drop, errors=\"ignore\")\n",
    "\n",
    "numeric_features = X.select_dtypes(include=[np.number]).columns.tolist()\n",
    "categorical_features = [c for c in X.columns if c not in numeric_features]\n",
    "\n",
    "cat_keep = []\n",
    "for col in categorical_features:\n",
    "    nun = X[col].nunique(dropna=True)\n",
    "    if 2 <= nun <= 30:\n",
    "        cat_keep.append(col)\n",
    "\n",
    "categorical_features = cat_keep\n",
    "X = X[numeric_features + categorical_features].copy()\n",
    "\n",
    "print(\"Удалённые признаки:\", to_drop)\n",
    "print(\"\\nЧисло исходных содержательных признаков:\", X.shape[1])\n",
    "print(\"Числовые признаки:\", numeric_features)\n",
    "print(\"Категориальные признаки:\", categorical_features)\n",
    "display(X.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "c6931339",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Размеры выборок:\n",
      "train: (150000, 18) (150000,)\n",
      "val: (50000, 18) (50000,)\n",
      "test: (50000, 18) (50000,)\n"
     ]
    }
   ],
   "source": [
    "# Разбиение на train / validation / test\n",
    "X_trainval, X_test, y_trainval, y_test, names_trainval, names_test = train_test_split(\n",
    "    X, y, name_series,\n",
    "    test_size=0.20,\n",
    "    random_state=SEED,\n",
    "    stratify=y\n",
    ")\n",
    "\n",
    "X_train, X_val, y_train, y_val, names_train, names_val = train_test_split(\n",
    "    X_trainval, y_trainval, names_trainval,\n",
    "    test_size=0.25,   # 0.25 * 0.80 = 0.20\n",
    "    random_state=SEED,\n",
    "    stratify=y_trainval\n",
    ")\n",
    "\n",
    "print(\"Размеры выборок:\")\n",
    "print(\"train:\", X_train.shape, y_train.shape)\n",
    "print(\"val:\", X_val.shape, y_val.shape)\n",
    "print(\"test:\", X_test.shape, y_test.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "76262ee1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Число признаков после кодирования: 30\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>num__a</th>\n",
       "      <th>num__e</th>\n",
       "      <th>num__i</th>\n",
       "      <th>num__om</th>\n",
       "      <th>num__w</th>\n",
       "      <th>num__q</th>\n",
       "      <th>num__ad</th>\n",
       "      <th>num__per_y</th>\n",
       "      <th>num__data_arc</th>\n",
       "      <th>num__condition_code</th>\n",
       "      <th>num__n_obs_used</th>\n",
       "      <th>num__h</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.031188</td>\n",
       "      <td>0.806925</td>\n",
       "      <td>-1.128852</td>\n",
       "      <td>0.314669</td>\n",
       "      <td>0.059868</td>\n",
       "      <td>0.023007</td>\n",
       "      <td>0.029438</td>\n",
       "      <td>-0.003820</td>\n",
       "      <td>4.396863</td>\n",
       "      <td>-0.479187</td>\n",
       "      <td>2.890778</td>\n",
       "      <td>-1.740824</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-0.016760</td>\n",
       "      <td>-0.608113</td>\n",
       "      <td>-0.471353</td>\n",
       "      <td>-0.340761</td>\n",
       "      <td>-1.125770</td>\n",
       "      <td>0.030330</td>\n",
       "      <td>-0.020955</td>\n",
       "      <td>-0.006358</td>\n",
       "      <td>-0.012200</td>\n",
       "      <td>-0.479187</td>\n",
       "      <td>-0.415983</td>\n",
       "      <td>0.067917</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.032497</td>\n",
       "      <td>-0.483232</td>\n",
       "      <td>1.726338</td>\n",
       "      <td>-1.054857</td>\n",
       "      <td>-0.084224</td>\n",
       "      <td>0.203822</td>\n",
       "      <td>0.009041</td>\n",
       "      <td>-0.003748</td>\n",
       "      <td>-0.408712</td>\n",
       "      <td>-0.479187</td>\n",
       "      <td>-0.558353</td>\n",
       "      <td>-0.315755</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-0.060804</td>\n",
       "      <td>-0.890887</td>\n",
       "      <td>-0.376364</td>\n",
       "      <td>1.356538</td>\n",
       "      <td>-0.673594</td>\n",
       "      <td>-0.112042</td>\n",
       "      <td>-0.049310</td>\n",
       "      <td>-0.008512</td>\n",
       "      <td>0.492364</td>\n",
       "      <td>-0.479187</td>\n",
       "      <td>0.148128</td>\n",
       "      <td>-0.151324</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-0.047200</td>\n",
       "      <td>1.479458</td>\n",
       "      <td>0.571230</td>\n",
       "      <td>-0.802631</td>\n",
       "      <td>0.340934</td>\n",
       "      <td>-0.305853</td>\n",
       "      <td>-0.011950</td>\n",
       "      <td>-0.007866</td>\n",
       "      <td>-1.360339</td>\n",
       "      <td>3.006464</td>\n",
       "      <td>-0.673862</td>\n",
       "      <td>0.719612</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     num__a    num__e    num__i   num__om    num__w    num__q   num__ad  \\\n",
       "0  0.031188  0.806925 -1.128852  0.314669  0.059868  0.023007  0.029438   \n",
       "1 -0.016760 -0.608113 -0.471353 -0.340761 -1.125770  0.030330 -0.020955   \n",
       "2  0.032497 -0.483232  1.726338 -1.054857 -0.084224  0.203822  0.009041   \n",
       "3 -0.060804 -0.890887 -0.376364  1.356538 -0.673594 -0.112042 -0.049310   \n",
       "4 -0.047200  1.479458  0.571230 -0.802631  0.340934 -0.305853 -0.011950   \n",
       "\n",
       "   num__per_y  num__data_arc  num__condition_code  num__n_obs_used    num__h  \n",
       "0   -0.003820       4.396863            -0.479187         2.890778 -1.740824  \n",
       "1   -0.006358      -0.012200            -0.479187        -0.415983  0.067917  \n",
       "2   -0.003748      -0.408712            -0.479187        -0.558353 -0.315755  \n",
       "3   -0.008512       0.492364            -0.479187         0.148128 -0.151324  \n",
       "4   -0.007866      -1.360339             3.006464        -0.673862  0.719612  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Предобработка\n",
    "try:\n",
    "    ohe = OneHotEncoder(handle_unknown=\"ignore\", sparse_output=False)\n",
    "except TypeError:\n",
    "    ohe = OneHotEncoder(handle_unknown=\"ignore\", sparse=False)\n",
    "\n",
    "num_pipe = Pipeline([\n",
    "    (\"imputer\", SimpleImputer(strategy=\"median\")),\n",
    "    (\"scaler\", StandardScaler())\n",
    "])\n",
    "\n",
    "cat_pipe = Pipeline([\n",
    "    (\"imputer\", SimpleImputer(strategy=\"most_frequent\")),\n",
    "    (\"ohe\", ohe)\n",
    "])\n",
    "\n",
    "preprocessor = ColumnTransformer(\n",
    "    transformers=[\n",
    "        (\"num\", num_pipe, numeric_features),\n",
    "        (\"cat\", cat_pipe, categorical_features)\n",
    "    ],\n",
    "    remainder=\"drop\"\n",
    ")\n",
    "\n",
    "X_train_p = preprocessor.fit_transform(X_train).astype(np.float32)\n",
    "X_val_p = preprocessor.transform(X_val).astype(np.float32)\n",
    "X_test_p = preprocessor.transform(X_test).astype(np.float32)\n",
    "X_trainval_p = preprocessor.transform(X_trainval).astype(np.float32)\n",
    "\n",
    "feature_names = preprocessor.get_feature_names_out().tolist()\n",
    "\n",
    "print(\"Число признаков после кодирования:\", X_train_p.shape[1])\n",
    "display(pd.DataFrame(X_train_p[:5, :min(12, X_train_p.shape[1])], columns=feature_names[:min(12, len(feature_names))]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "193784e9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Вспомогательные функции\n",
    "def evaluate_binary(y_true, y_pred, y_proba):\n",
    "    out = {\n",
    "        \"accuracy\": accuracy_score(y_true, y_pred),\n",
    "        \"balanced_accuracy\": balanced_accuracy_score(y_true, y_pred),\n",
    "        \"precision_macro\": precision_score(y_true, y_pred, average=\"macro\", zero_division=0),\n",
    "        \"recall_macro\": recall_score(y_true, y_pred, average=\"macro\", zero_division=0),\n",
    "        \"f1_macro\": f1_score(y_true, y_pred, average=\"macro\", zero_division=0),\n",
    "    }\n",
    "    try:\n",
    "        out[\"roc_auc\"] = roc_auc_score(y_true, y_proba)\n",
    "    except Exception:\n",
    "        out[\"roc_auc\"] = np.nan\n",
    "    return out\n",
    "\n",
    "def choose_sigma_from_centers(centers):\n",
    "    if len(centers) <= 1:\n",
    "        return 1.0\n",
    "    d = pairwise_distances(centers)\n",
    "    d[d == 0] = np.nan\n",
    "    nearest = np.nanmin(d, axis=1)\n",
    "    nearest = nearest[np.isfinite(nearest)]\n",
    "    if len(nearest) == 0:\n",
    "        return 1.0\n",
    "    return float(np.nanmedian(nearest))\n",
    "\n",
    "class PrototypeRBFClassifier:\n",
    "    def __init__(self, n_prototypes_per_class=4, sigma_multiplier=1.0, C=1.0, random_state=42):\n",
    "        self.n_prototypes_per_class = n_prototypes_per_class\n",
    "        self.sigma_multiplier = sigma_multiplier\n",
    "        self.C = C\n",
    "        self.random_state = random_state\n",
    "\n",
    "    def _build_prototypes(self, X, y):\n",
    "        centers = []\n",
    "        labels = []\n",
    "        unique_classes = np.sort(np.unique(y))\n",
    "        for cls in unique_classes:\n",
    "            Xc = X[y == cls]\n",
    "            n_clusters = min(self.n_prototypes_per_class, len(Xc))\n",
    "            if n_clusters <= 0:\n",
    "                continue\n",
    "            km = MiniBatchKMeans(\n",
    "                n_clusters=n_clusters,\n",
    "                random_state=self.random_state,\n",
    "                batch_size=min(4096, max(128, len(Xc))),\n",
    "                n_init=5\n",
    "            )\n",
    "            km.fit(Xc)\n",
    "            centers.append(km.cluster_centers_)\n",
    "            labels.extend([cls] * n_clusters)\n",
    "        centers = np.vstack(centers).astype(np.float32)\n",
    "        labels = np.array(labels)\n",
    "        return centers, labels\n",
    "\n",
    "    def _rbf(self, X):\n",
    "        d2 = pairwise_distances(X, self.prototypes_, metric=\"euclidean\") ** 2\n",
    "        denom = max(self.sigma_ ** 2, 1e-8) * 2.0\n",
    "        return np.exp(-d2 / denom).astype(np.float32)\n",
    "\n",
    "    def fit(self, X, y):\n",
    "        X = np.asarray(X, dtype=np.float32)\n",
    "        y = np.asarray(y)\n",
    "        self.prototypes_, self.prototype_labels_ = self._build_prototypes(X, y)\n",
    "        self.sigma_base_ = choose_sigma_from_centers(self.prototypes_)\n",
    "        self.sigma_ = float(max(self.sigma_base_ * self.sigma_multiplier, 1e-6))\n",
    "\n",
    "        Z = self._rbf(X)\n",
    "        self.output_layer_ = LogisticRegression(\n",
    "            max_iter=1000,\n",
    "            C=self.C,\n",
    "            class_weight=\"balanced\",\n",
    "            random_state=self.random_state\n",
    "        )\n",
    "        self.output_layer_.fit(Z, y)\n",
    "\n",
    "        self.assign_idx_, self.assign_dist_ = pairwise_distances_argmin_min(X, self.prototypes_)\n",
    "        self.train_support_ = pd.Series(self.assign_idx_).value_counts().sort_index()\n",
    "        return self\n",
    "\n",
    "    def predict_proba(self, X):\n",
    "        Z = self._rbf(np.asarray(X, dtype=np.float32))\n",
    "        return self.output_layer_.predict_proba(Z)\n",
    "\n",
    "    def predict(self, X):\n",
    "        return self.predict_proba(X)[:, 1] >= 0.5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "ca93866f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Топ-10 конфигураций:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>n_prototypes_per_class</th>\n",
       "      <th>total_prototypes</th>\n",
       "      <th>sigma_multiplier</th>\n",
       "      <th>sigma</th>\n",
       "      <th>C</th>\n",
       "      <th>val_accuracy</th>\n",
       "      <th>val_balanced_accuracy</th>\n",
       "      <th>val_precision_macro</th>\n",
       "      <th>val_recall_macro</th>\n",
       "      <th>val_f1_macro</th>\n",
       "      <th>val_roc_auc</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5</td>\n",
       "      <td>10</td>\n",
       "      <td>2.0</td>\n",
       "      <td>5.948137</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.97934</td>\n",
       "      <td>0.989645</td>\n",
       "      <td>0.552814</td>\n",
       "      <td>0.989645</td>\n",
       "      <td>0.590305</td>\n",
       "      <td>0.996080</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5</td>\n",
       "      <td>10</td>\n",
       "      <td>2.0</td>\n",
       "      <td>5.948137</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.97890</td>\n",
       "      <td>0.989424</td>\n",
       "      <td>0.551827</td>\n",
       "      <td>0.989424</td>\n",
       "      <td>0.588574</td>\n",
       "      <td>0.996067</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4</td>\n",
       "      <td>8</td>\n",
       "      <td>2.0</td>\n",
       "      <td>5.807996</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.97860</td>\n",
       "      <td>0.989274</td>\n",
       "      <td>0.551174</td>\n",
       "      <td>0.989274</td>\n",
       "      <td>0.587425</td>\n",
       "      <td>0.995346</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5</td>\n",
       "      <td>10</td>\n",
       "      <td>2.0</td>\n",
       "      <td>5.948137</td>\n",
       "      <td>0.5</td>\n",
       "      <td>0.97824</td>\n",
       "      <td>0.989093</td>\n",
       "      <td>0.550413</td>\n",
       "      <td>0.989093</td>\n",
       "      <td>0.586078</td>\n",
       "      <td>0.995906</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>8</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.903998</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.97822</td>\n",
       "      <td>0.984995</td>\n",
       "      <td>0.550031</td>\n",
       "      <td>0.984995</td>\n",
       "      <td>0.585391</td>\n",
       "      <td>0.995502</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>4</td>\n",
       "      <td>8</td>\n",
       "      <td>2.0</td>\n",
       "      <td>5.807996</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.97802</td>\n",
       "      <td>0.988983</td>\n",
       "      <td>0.549959</td>\n",
       "      <td>0.988983</td>\n",
       "      <td>0.585272</td>\n",
       "      <td>0.995145</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>3</td>\n",
       "      <td>6</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3.745638</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.97786</td>\n",
       "      <td>0.988903</td>\n",
       "      <td>0.549634</td>\n",
       "      <td>0.988903</td>\n",
       "      <td>0.584693</td>\n",
       "      <td>0.994570</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>4</td>\n",
       "      <td>8</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.903998</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.97770</td>\n",
       "      <td>0.988823</td>\n",
       "      <td>0.549313</td>\n",
       "      <td>0.988823</td>\n",
       "      <td>0.584120</td>\n",
       "      <td>0.995309</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>4</td>\n",
       "      <td>8</td>\n",
       "      <td>2.0</td>\n",
       "      <td>5.807996</td>\n",
       "      <td>0.5</td>\n",
       "      <td>0.97768</td>\n",
       "      <td>0.988813</td>\n",
       "      <td>0.549273</td>\n",
       "      <td>0.988813</td>\n",
       "      <td>0.584049</td>\n",
       "      <td>0.994914</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>3</td>\n",
       "      <td>6</td>\n",
       "      <td>2.0</td>\n",
       "      <td>7.491276</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.97758</td>\n",
       "      <td>0.988763</td>\n",
       "      <td>0.549075</td>\n",
       "      <td>0.988763</td>\n",
       "      <td>0.583695</td>\n",
       "      <td>0.993315</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   n_prototypes_per_class  total_prototypes  sigma_multiplier     sigma    C  \\\n",
       "0                       5                10               2.0  5.948137  2.0   \n",
       "1                       5                10               2.0  5.948137  1.0   \n",
       "2                       4                 8               2.0  5.807996  2.0   \n",
       "3                       5                10               2.0  5.948137  0.5   \n",
       "4                       4                 8               1.0  2.903998  2.0   \n",
       "5                       4                 8               2.0  5.807996  1.0   \n",
       "6                       3                 6               1.0  3.745638  2.0   \n",
       "7                       4                 8               1.0  2.903998  1.0   \n",
       "8                       4                 8               2.0  5.807996  0.5   \n",
       "9                       3                 6               2.0  7.491276  2.0   \n",
       "\n",
       "   val_accuracy  val_balanced_accuracy  val_precision_macro  val_recall_macro  \\\n",
       "0       0.97934               0.989645             0.552814          0.989645   \n",
       "1       0.97890               0.989424             0.551827          0.989424   \n",
       "2       0.97860               0.989274             0.551174          0.989274   \n",
       "3       0.97824               0.989093             0.550413          0.989093   \n",
       "4       0.97822               0.984995             0.550031          0.984995   \n",
       "5       0.97802               0.988983             0.549959          0.988983   \n",
       "6       0.97786               0.988903             0.549634          0.988903   \n",
       "7       0.97770               0.988823             0.549313          0.988823   \n",
       "8       0.97768               0.988813             0.549273          0.988813   \n",
       "9       0.97758               0.988763             0.549075          0.988763   \n",
       "\n",
       "   val_f1_macro  val_roc_auc  \n",
       "0      0.590305     0.996080  \n",
       "1      0.588574     0.996067  \n",
       "2      0.587425     0.995346  \n",
       "3      0.586078     0.995906  \n",
       "4      0.585391     0.995502  \n",
       "5      0.585272     0.995145  \n",
       "6      0.584693     0.994570  \n",
       "7      0.584120     0.995309  \n",
       "8      0.584049     0.994914  \n",
       "9      0.583695     0.993315  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Подбор гиперпараметров прототипной сети по validation\n",
    "n_proto_grid = [2, 3, 4, 5]\n",
    "sigma_mult_grid = [0.5, 1.0, 2.0]\n",
    "C_grid = [0.5, 1.0, 2.0]\n",
    "\n",
    "search_rows = []\n",
    "best_model = None\n",
    "best_tuple = None\n",
    "\n",
    "for n_proto in n_proto_grid:\n",
    "    for sigma_mult in sigma_mult_grid:\n",
    "        for C in C_grid:\n",
    "            model = PrototypeRBFClassifier(\n",
    "                n_prototypes_per_class=n_proto,\n",
    "                sigma_multiplier=sigma_mult,\n",
    "                C=C,\n",
    "                random_state=SEED\n",
    "            )\n",
    "            model.fit(X_train_p, y_train.values)\n",
    "\n",
    "            val_proba = model.predict_proba(X_val_p)[:, 1]\n",
    "            val_pred = (val_proba >= 0.5).astype(int)\n",
    "            val_metrics = evaluate_binary(y_val.values, val_pred, val_proba)\n",
    "\n",
    "            row = {\n",
    "                \"n_prototypes_per_class\": n_proto,\n",
    "                \"total_prototypes\": len(model.prototypes_),\n",
    "                \"sigma_multiplier\": sigma_mult,\n",
    "                \"sigma\": model.sigma_,\n",
    "                \"C\": C,\n",
    "                **{f\"val_{k}\": v for k, v in val_metrics.items()}\n",
    "            }\n",
    "            search_rows.append(row)\n",
    "\n",
    "            candidate_tuple = (\n",
    "                val_metrics[\"f1_macro\"],\n",
    "                val_metrics[\"balanced_accuracy\"],\n",
    "                val_metrics[\"roc_auc\"] if pd.notna(val_metrics[\"roc_auc\"]) else -np.inf\n",
    "            )\n",
    "            if best_tuple is None or candidate_tuple > best_tuple:\n",
    "                best_tuple = candidate_tuple\n",
    "                best_model = model\n",
    "\n",
    "search_df = pd.DataFrame(search_rows).sort_values(\n",
    "    [\"val_f1_macro\", \"val_balanced_accuracy\", \"val_roc_auc\"],\n",
    "    ascending=False\n",
    ").reset_index(drop=True)\n",
    "\n",
    "print(\"Топ-10 конфигураций:\")\n",
    "display(search_df.head(10))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "accbc804",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Метрики модели:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>split</th>\n",
       "      <th>accuracy</th>\n",
       "      <th>balanced_accuracy</th>\n",
       "      <th>precision_macro</th>\n",
       "      <th>recall_macro</th>\n",
       "      <th>f1_macro</th>\n",
       "      <th>roc_auc</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>trainval</td>\n",
       "      <td>0.979375</td>\n",
       "      <td>0.988644</td>\n",
       "      <td>0.553000</td>\n",
       "      <td>0.988644</td>\n",
       "      <td>0.590603</td>\n",
       "      <td>0.995785</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>test</td>\n",
       "      <td>0.979320</td>\n",
       "      <td>0.985546</td>\n",
       "      <td>0.552416</td>\n",
       "      <td>0.985546</td>\n",
       "      <td>0.589591</td>\n",
       "      <td>0.996248</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      split  accuracy  balanced_accuracy  precision_macro  recall_macro  \\\n",
       "0  trainval  0.979375           0.988644         0.553000      0.988644   \n",
       "1      test  0.979320           0.985546         0.552416      0.985546   \n",
       "\n",
       "   f1_macro   roc_auc  \n",
       "0  0.590603  0.995785  \n",
       "1  0.589591  0.996248  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Матрица ошибок на test:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>48845</td>\n",
       "      <td>1033</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>121</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       0     1\n",
       "0  48845  1033\n",
       "1      1   121"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Классификационный отчёт на test:\n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "           0     1.0000    0.9793    0.9895     49878\n",
      "           1     0.1049    0.9918    0.1897       122\n",
      "\n",
      "    accuracy                         0.9793     50000\n",
      "   macro avg     0.5524    0.9855    0.5896     50000\n",
      "weighted avg     0.9978    0.9793    0.9876     50000\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Финальная модель: переобучение на train+validation с лучшими параметрами\n",
    "best_params = search_df.iloc[0].to_dict()\n",
    "\n",
    "final_model = PrototypeRBFClassifier(\n",
    "    n_prototypes_per_class=int(best_params[\"n_prototypes_per_class\"]),\n",
    "    sigma_multiplier=float(best_params[\"sigma_multiplier\"]),\n",
    "    C=float(best_params[\"C\"]),\n",
    "    random_state=SEED\n",
    ")\n",
    "final_model.fit(X_trainval_p, y_trainval.values)\n",
    "\n",
    "# Метрики\n",
    "trainval_proba = final_model.predict_proba(X_trainval_p)[:, 1]\n",
    "trainval_pred = (trainval_proba >= 0.5).astype(int)\n",
    "\n",
    "test_proba = final_model.predict_proba(X_test_p)[:, 1]\n",
    "test_pred = (test_proba >= 0.5).astype(int)\n",
    "\n",
    "metrics_trainval = evaluate_binary(y_trainval.values, trainval_pred, trainval_proba)\n",
    "metrics_test = evaluate_binary(y_test.values, test_pred, test_proba)\n",
    "\n",
    "results_df = pd.DataFrame([\n",
    "    {\"split\": \"trainval\", **metrics_trainval},\n",
    "    {\"split\": \"test\", **metrics_test}\n",
    "])\n",
    "\n",
    "print(\"Метрики модели:\")\n",
    "display(results_df)\n",
    "print(\"Матрица ошибок на test:\")\n",
    "cm = confusion_matrix(y_test.values, test_pred)\n",
    "display(pd.DataFrame(cm))\n",
    "print(\"Классификационный отчёт на test:\")\n",
    "print(classification_report(y_test.values, test_pred, digits=4, zero_division=0))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "1b7d4810",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Топ-10 признаков по permutation importance:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>feature</th>\n",
       "      <th>permutation_importance_mean</th>\n",
       "      <th>permutation_importance_std</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>e</td>\n",
       "      <td>0.047664</td>\n",
       "      <td>0.002647</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>condition_code</td>\n",
       "      <td>0.007815</td>\n",
       "      <td>0.000350</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>data_arc</td>\n",
       "      <td>0.003683</td>\n",
       "      <td>0.000315</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>n</td>\n",
       "      <td>0.002203</td>\n",
       "      <td>0.003038</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>n_obs_used</td>\n",
       "      <td>0.001505</td>\n",
       "      <td>0.000192</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>ma</td>\n",
       "      <td>0.000422</td>\n",
       "      <td>0.000162</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>w</td>\n",
       "      <td>0.000026</td>\n",
       "      <td>0.000099</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>per</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>per_y</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>ad</td>\n",
       "      <td>-0.000026</td>\n",
       "      <td>0.000037</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          feature  permutation_importance_mean  permutation_importance_std\n",
       "0               e                     0.047664                    0.002647\n",
       "1  condition_code                     0.007815                    0.000350\n",
       "2        data_arc                     0.003683                    0.000315\n",
       "3               n                     0.002203                    0.003038\n",
       "4      n_obs_used                     0.001505                    0.000192\n",
       "5              ma                     0.000422                    0.000162\n",
       "6               w                     0.000026                    0.000099\n",
       "7             per                     0.000000                    0.000000\n",
       "8           per_y                     0.000000                    0.000000\n",
       "9              ad                    -0.000026                    0.000037"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Глобальная важность исходных признаков методом перестановок\n",
    "def score_from_raw(X_raw_df, y_true):\n",
    "    Xp = preprocessor.transform(X_raw_df).astype(np.float32)\n",
    "    proba = final_model.predict_proba(Xp)[:, 1]\n",
    "    pred = (proba >= 0.5).astype(int)\n",
    "    return f1_score(y_true, pred, average=\"macro\", zero_division=0)\n",
    "\n",
    "baseline_f1 = score_from_raw(X_test.copy(), y_test.values)\n",
    "perm_rows = []\n",
    "n_repeats = 3\n",
    "rng = np.random.default_rng(SEED)\n",
    "\n",
    "for col in X_test.columns:\n",
    "    scores = []\n",
    "    for _ in range(n_repeats):\n",
    "        X_perm = X_test.copy()\n",
    "        X_perm[col] = rng.permutation(X_perm[col].values)\n",
    "        s = score_from_raw(X_perm, y_test.values)\n",
    "        scores.append(baseline_f1 - s)\n",
    "    perm_rows.append({\n",
    "        \"feature\": col,\n",
    "        \"permutation_importance_mean\": float(np.mean(scores)),\n",
    "        \"permutation_importance_std\": float(np.std(scores))\n",
    "    })\n",
    "\n",
    "perm_df = pd.DataFrame(perm_rows).sort_values(\n",
    "    \"permutation_importance_mean\", ascending=False\n",
    ").reset_index(drop=True)\n",
    "\n",
    "print(\"Топ-10 признаков по permutation importance:\")\n",
    "display(perm_df.head(10))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "fddae509",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Основные прототипы:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>prototype_id</th>\n",
       "      <th>prototype_class</th>\n",
       "      <th>support_n</th>\n",
       "      <th>support_share_trainval</th>\n",
       "      <th>prototype_output_weight</th>\n",
       "      <th>impact_score</th>\n",
       "      <th>assigned_positive_rate_trainval</th>\n",
       "      <th>nearest_objects</th>\n",
       "      <th>e</th>\n",
       "      <th>condition_code</th>\n",
       "      <th>data_arc</th>\n",
       "      <th>n</th>\n",
       "      <th>n_obs_used</th>\n",
       "      <th>ma</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>27675</td>\n",
       "      <td>0.138375</td>\n",
       "      <td>-22.340298</td>\n",
       "      <td>3.091339</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>nan, nan, nan</td>\n",
       "      <td>0.173780</td>\n",
       "      <td>6.712051</td>\n",
       "      <td>253.674230</td>\n",
       "      <td>0.237302</td>\n",
       "      <td>18.095537</td>\n",
       "      <td>181.147050</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>71765</td>\n",
       "      <td>0.358825</td>\n",
       "      <td>-3.436301</td>\n",
       "      <td>1.233031</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>nan, nan, nan</td>\n",
       "      <td>0.143826</td>\n",
       "      <td>0.187250</td>\n",
       "      <td>5927.571853</td>\n",
       "      <td>0.226685</td>\n",
       "      <td>199.988755</td>\n",
       "      <td>189.886354</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>48725</td>\n",
       "      <td>0.243625</td>\n",
       "      <td>-3.356772</td>\n",
       "      <td>0.817794</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>nan, nan, nan</td>\n",
       "      <td>0.155231</td>\n",
       "      <td>0.237537</td>\n",
       "      <td>5965.458902</td>\n",
       "      <td>0.233649</td>\n",
       "      <td>198.257712</td>\n",
       "      <td>162.908708</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>19161</td>\n",
       "      <td>0.095805</td>\n",
       "      <td>6.953126</td>\n",
       "      <td>0.666144</td>\n",
       "      <td>0.000052</td>\n",
       "      <td>nan, nan, nan</td>\n",
       "      <td>0.137637</td>\n",
       "      <td>0.206524</td>\n",
       "      <td>12629.802255</td>\n",
       "      <td>0.217945</td>\n",
       "      <td>1149.485048</td>\n",
       "      <td>176.631783</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>27221</td>\n",
       "      <td>0.136105</td>\n",
       "      <td>-3.970871</td>\n",
       "      <td>0.540455</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>nan, nan, nan</td>\n",
       "      <td>0.124196</td>\n",
       "      <td>0.236619</td>\n",
       "      <td>6007.014474</td>\n",
       "      <td>0.235234</td>\n",
       "      <td>199.583740</td>\n",
       "      <td>192.906197</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>2012</td>\n",
       "      <td>0.010060</td>\n",
       "      <td>10.944390</td>\n",
       "      <td>0.110101</td>\n",
       "      <td>0.086481</td>\n",
       "      <td>nan, nan, nan</td>\n",
       "      <td>0.594384</td>\n",
       "      <td>5.188867</td>\n",
       "      <td>1152.310966</td>\n",
       "      <td>0.264277</td>\n",
       "      <td>92.923956</td>\n",
       "      <td>172.768315</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7</td>\n",
       "      <td>1</td>\n",
       "      <td>1237</td>\n",
       "      <td>0.006185</td>\n",
       "      <td>11.830092</td>\n",
       "      <td>0.073169</td>\n",
       "      <td>0.099434</td>\n",
       "      <td>nan, nan, nan</td>\n",
       "      <td>0.457377</td>\n",
       "      <td>4.561035</td>\n",
       "      <td>1747.569672</td>\n",
       "      <td>0.455755</td>\n",
       "      <td>109.617623</td>\n",
       "      <td>176.214085</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "      <td>817</td>\n",
       "      <td>0.004085</td>\n",
       "      <td>8.297359</td>\n",
       "      <td>0.033895</td>\n",
       "      <td>0.083231</td>\n",
       "      <td>nan, nan, nan</td>\n",
       "      <td>0.370235</td>\n",
       "      <td>4.069767</td>\n",
       "      <td>2102.944717</td>\n",
       "      <td>0.508083</td>\n",
       "      <td>116.062424</td>\n",
       "      <td>174.708108</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>8</td>\n",
       "      <td>1</td>\n",
       "      <td>190</td>\n",
       "      <td>0.000950</td>\n",
       "      <td>7.401795</td>\n",
       "      <td>0.007032</td>\n",
       "      <td>0.178947</td>\n",
       "      <td>nan, nan, nan</td>\n",
       "      <td>0.410074</td>\n",
       "      <td>4.278947</td>\n",
       "      <td>1910.597884</td>\n",
       "      <td>1.374819</td>\n",
       "      <td>152.363158</td>\n",
       "      <td>172.634906</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "      <td>1197</td>\n",
       "      <td>0.005985</td>\n",
       "      <td>0.684461</td>\n",
       "      <td>0.004096</td>\n",
       "      <td>0.075188</td>\n",
       "      <td>nan, nan, nan</td>\n",
       "      <td>0.247385</td>\n",
       "      <td>5.035923</td>\n",
       "      <td>1103.224767</td>\n",
       "      <td>0.835589</td>\n",
       "      <td>86.230576</td>\n",
       "      <td>176.160290</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   prototype_id  prototype_class  support_n  support_share_trainval  \\\n",
       "0             3                0      27675                0.138375   \n",
       "1             0                0      71765                0.358825   \n",
       "2             4                0      48725                0.243625   \n",
       "3             2                0      19161                0.095805   \n",
       "4             1                0      27221                0.136105   \n",
       "5             5                1       2012                0.010060   \n",
       "6             7                1       1237                0.006185   \n",
       "7             9                1        817                0.004085   \n",
       "8             8                1        190                0.000950   \n",
       "9             6                1       1197                0.005985   \n",
       "\n",
       "   prototype_output_weight  impact_score  assigned_positive_rate_trainval  \\\n",
       "0               -22.340298      3.091339                         0.000000   \n",
       "1                -3.436301      1.233031                         0.000000   \n",
       "2                -3.356772      0.817794                         0.000000   \n",
       "3                 6.953126      0.666144                         0.000052   \n",
       "4                -3.970871      0.540455                         0.000000   \n",
       "5                10.944390      0.110101                         0.086481   \n",
       "6                11.830092      0.073169                         0.099434   \n",
       "7                 8.297359      0.033895                         0.083231   \n",
       "8                 7.401795      0.007032                         0.178947   \n",
       "9                 0.684461      0.004096                         0.075188   \n",
       "\n",
       "  nearest_objects         e  condition_code      data_arc         n  \\\n",
       "0   nan, nan, nan  0.173780        6.712051    253.674230  0.237302   \n",
       "1   nan, nan, nan  0.143826        0.187250   5927.571853  0.226685   \n",
       "2   nan, nan, nan  0.155231        0.237537   5965.458902  0.233649   \n",
       "3   nan, nan, nan  0.137637        0.206524  12629.802255  0.217945   \n",
       "4   nan, nan, nan  0.124196        0.236619   6007.014474  0.235234   \n",
       "5   nan, nan, nan  0.594384        5.188867   1152.310966  0.264277   \n",
       "6   nan, nan, nan  0.457377        4.561035   1747.569672  0.455755   \n",
       "7   nan, nan, nan  0.370235        4.069767   2102.944717  0.508083   \n",
       "8   nan, nan, nan  0.410074        4.278947   1910.597884  1.374819   \n",
       "9   nan, nan, nan  0.247385        5.035923   1103.224767  0.835589   \n",
       "\n",
       "    n_obs_used          ma  \n",
       "0    18.095537  181.147050  \n",
       "1   199.988755  189.886354  \n",
       "2   198.257712  162.908708  \n",
       "3  1149.485048  176.631783  \n",
       "4   199.583740  192.906197  \n",
       "5    92.923956  172.768315  \n",
       "6   109.617623  176.214085  \n",
       "7   116.062424  174.708108  \n",
       "8   152.363158  172.634906  \n",
       "9    86.230576  176.160290  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Основные прототипы: поддержка, влияние, ближайшие объекты и интерпретация значений признаков\n",
    "assign_idx_trainval, assign_dist_trainval = pairwise_distances_argmin_min(X_trainval_p, final_model.prototypes_)\n",
    "support_counts = pd.Series(assign_idx_trainval).value_counts().sort_index()\n",
    "support_share = support_counts / len(X_trainval)\n",
    "\n",
    "coef = final_model.output_layer_.coef_.ravel()\n",
    "impact = np.abs(coef) * support_share.reindex(range(len(coef)), fill_value=0).values\n",
    "\n",
    "# Для интерпретации берём 6 наиболее важных исходных признаков\n",
    "top_proto_features = perm_df.head(6)[\"feature\"].tolist()\n",
    "\n",
    "prototype_rows = []\n",
    "for pid in range(len(final_model.prototypes_)):\n",
    "    mask = assign_idx_trainval == pid\n",
    "    subset_X = X_trainval.loc[mask]\n",
    "    subset_y = y_trainval.loc[mask]\n",
    "    subset_names = names_trainval.loc[mask] if isinstance(names_trainval, pd.Series) else pd.Series(names_trainval)[mask]\n",
    "\n",
    "    nearest_idx = np.argsort(np.where(mask, assign_dist_trainval, np.inf))[:3]\n",
    "    nearest_objects = \", \".join(pd.Series(names_trainval).iloc[nearest_idx].astype(str).tolist())\n",
    "\n",
    "    row = {\n",
    "        \"prototype_id\": pid,\n",
    "        \"prototype_class\": int(final_model.prototype_labels_[pid]),\n",
    "        \"support_n\": int(mask.sum()),\n",
    "        \"support_share_trainval\": float(mask.mean()),\n",
    "        \"prototype_output_weight\": float(coef[pid]),\n",
    "        \"impact_score\": float(impact[pid]),\n",
    "        \"assigned_positive_rate_trainval\": float(subset_y.mean()) if len(subset_y) else np.nan,\n",
    "        \"nearest_objects\": nearest_objects\n",
    "    }\n",
    "    for feat in top_proto_features:\n",
    "        if feat in subset_X.columns:\n",
    "            if pd.api.types.is_numeric_dtype(subset_X[feat]):\n",
    "                row[feat] = float(subset_X[feat].mean()) if len(subset_X) else np.nan\n",
    "            else:\n",
    "                row[feat] = subset_X[feat].mode(dropna=True).iloc[0] if len(subset_X) and subset_X[feat].notna().any() else np.nan\n",
    "    prototype_rows.append(row)\n",
    "\n",
    "prototype_df = pd.DataFrame(prototype_rows).sort_values(\n",
    "    [\"impact_score\", \"support_n\"], ascending=False\n",
    ").reset_index(drop=True)\n",
    "\n",
    "print(\"Основные прототипы:\")\n",
    "display(prototype_df.head(10))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "4d7d44d5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Топ-5 вкладов признаков для основных прототипов:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>prototype_id</th>\n",
       "      <th>feature</th>\n",
       "      <th>contribution_score</th>\n",
       "      <th>interpreted_value</th>\n",
       "      <th>rank</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>condition_code</td>\n",
       "      <td>47.354488</td>\n",
       "      <td>6.712051</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3</td>\n",
       "      <td>data_arc</td>\n",
       "      <td>29.003140</td>\n",
       "      <td>253.67423</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>h</td>\n",
       "      <td>15.000282</td>\n",
       "      <td>18.005536</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>n_obs_used</td>\n",
       "      <td>14.650509</td>\n",
       "      <td>18.095537</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>3</td>\n",
       "      <td>i</td>\n",
       "      <td>5.876609</td>\n",
       "      <td>10.6704</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>om</td>\n",
       "      <td>2.823642</td>\n",
       "      <td>84.040274</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0</td>\n",
       "      <td>condition_code</td>\n",
       "      <td>1.396686</td>\n",
       "      <td>0.18725</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0</td>\n",
       "      <td>w</td>\n",
       "      <td>0.829923</td>\n",
       "      <td>205.94687</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0</td>\n",
       "      <td>n_obs_used</td>\n",
       "      <td>0.577087</td>\n",
       "      <td>199.988755</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0</td>\n",
       "      <td>n</td>\n",
       "      <td>0.460764</td>\n",
       "      <td>0.226685</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>4</td>\n",
       "      <td>w</td>\n",
       "      <td>2.976050</td>\n",
       "      <td>88.600346</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>4</td>\n",
       "      <td>om</td>\n",
       "      <td>2.472264</td>\n",
       "      <td>244.593422</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>4</td>\n",
       "      <td>condition_code</td>\n",
       "      <td>1.299007</td>\n",
       "      <td>0.237537</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>4</td>\n",
       "      <td>n_obs_used</td>\n",
       "      <td>0.579315</td>\n",
       "      <td>198.257712</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>4</td>\n",
       "      <td>ma</td>\n",
       "      <td>0.565179</td>\n",
       "      <td>162.908708</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>2</td>\n",
       "      <td>n_obs_used</td>\n",
       "      <td>16.539234</td>\n",
       "      <td>1149.485048</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>2</td>\n",
       "      <td>data_arc</td>\n",
       "      <td>11.543415</td>\n",
       "      <td>12629.802255</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>2</td>\n",
       "      <td>h</td>\n",
       "      <td>10.733020</td>\n",
       "      <td>13.951356</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>2</td>\n",
       "      <td>q</td>\n",
       "      <td>4.311334</td>\n",
       "      <td>3.780162</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>2</td>\n",
       "      <td>moid</td>\n",
       "      <td>4.298770</td>\n",
       "      <td>2.791547</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>1</td>\n",
       "      <td>w</td>\n",
       "      <td>3.681780</td>\n",
       "      <td>277.27254</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>1</td>\n",
       "      <td>om</td>\n",
       "      <td>3.497154</td>\n",
       "      <td>259.451502</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>1</td>\n",
       "      <td>condition_code</td>\n",
       "      <td>1.538064</td>\n",
       "      <td>0.236619</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>1</td>\n",
       "      <td>e</td>\n",
       "      <td>1.335589</td>\n",
       "      <td>0.124196</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>1</td>\n",
       "      <td>i</td>\n",
       "      <td>0.946616</td>\n",
       "      <td>7.349716</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>5</td>\n",
       "      <td>e</td>\n",
       "      <td>51.163396</td>\n",
       "      <td>0.594384</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>5</td>\n",
       "      <td>h</td>\n",
       "      <td>29.822758</td>\n",
       "      <td>21.763982</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>5</td>\n",
       "      <td>condition_code</td>\n",
       "      <td>16.744631</td>\n",
       "      <td>5.188867</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>5</td>\n",
       "      <td>data_arc</td>\n",
       "      <td>11.857491</td>\n",
       "      <td>1152.310966</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>5</td>\n",
       "      <td>neo</td>\n",
       "      <td>9.156986</td>\n",
       "      <td>Y</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>7</td>\n",
       "      <td>h</td>\n",
       "      <td>39.831296</td>\n",
       "      <td>22.939038</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>7</td>\n",
       "      <td>e</td>\n",
       "      <td>38.029636</td>\n",
       "      <td>0.457377</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>7</td>\n",
       "      <td>n</td>\n",
       "      <td>30.939406</td>\n",
       "      <td>0.455755</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>7</td>\n",
       "      <td>condition_code</td>\n",
       "      <td>15.224177</td>\n",
       "      <td>4.561035</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>7</td>\n",
       "      <td>neo</td>\n",
       "      <td>11.453909</td>\n",
       "      <td>Y</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>9</td>\n",
       "      <td>i</td>\n",
       "      <td>32.003468</td>\n",
       "      <td>34.471053</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>9</td>\n",
       "      <td>n</td>\n",
       "      <td>26.911428</td>\n",
       "      <td>0.508083</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>9</td>\n",
       "      <td>e</td>\n",
       "      <td>18.966981</td>\n",
       "      <td>0.370235</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>9</td>\n",
       "      <td>h</td>\n",
       "      <td>17.865198</td>\n",
       "      <td>20.717401</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>9</td>\n",
       "      <td>condition_code</td>\n",
       "      <td>9.099745</td>\n",
       "      <td>4.069767</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    prototype_id         feature  contribution_score interpreted_value  rank\n",
       "0              3  condition_code           47.354488          6.712051     1\n",
       "1              3        data_arc           29.003140         253.67423     2\n",
       "2              3               h           15.000282         18.005536     3\n",
       "3              3      n_obs_used           14.650509         18.095537     4\n",
       "4              3               i            5.876609           10.6704     5\n",
       "5              0              om            2.823642         84.040274     1\n",
       "6              0  condition_code            1.396686           0.18725     2\n",
       "7              0               w            0.829923         205.94687     3\n",
       "8              0      n_obs_used            0.577087        199.988755     4\n",
       "9              0               n            0.460764          0.226685     5\n",
       "10             4               w            2.976050         88.600346     1\n",
       "11             4              om            2.472264        244.593422     2\n",
       "12             4  condition_code            1.299007          0.237537     3\n",
       "13             4      n_obs_used            0.579315        198.257712     4\n",
       "14             4              ma            0.565179        162.908708     5\n",
       "15             2      n_obs_used           16.539234       1149.485048     1\n",
       "16             2        data_arc           11.543415      12629.802255     2\n",
       "17             2               h           10.733020         13.951356     3\n",
       "18             2               q            4.311334          3.780162     4\n",
       "19             2            moid            4.298770          2.791547     5\n",
       "20             1               w            3.681780         277.27254     1\n",
       "21             1              om            3.497154        259.451502     2\n",
       "22             1  condition_code            1.538064          0.236619     3\n",
       "23             1               e            1.335589          0.124196     4\n",
       "24             1               i            0.946616          7.349716     5\n",
       "25             5               e           51.163396          0.594384     1\n",
       "26             5               h           29.822758         21.763982     2\n",
       "27             5  condition_code           16.744631          5.188867     3\n",
       "28             5        data_arc           11.857491       1152.310966     4\n",
       "29             5             neo            9.156986                 Y     5\n",
       "30             7               h           39.831296         22.939038     1\n",
       "31             7               e           38.029636          0.457377     2\n",
       "32             7               n           30.939406          0.455755     3\n",
       "33             7  condition_code           15.224177          4.561035     4\n",
       "34             7             neo           11.453909                 Y     5\n",
       "35             9               i           32.003468         34.471053     1\n",
       "36             9               n           26.911428          0.508083     2\n",
       "37             9               e           18.966981          0.370235     3\n",
       "38             9               h           17.865198         20.717401     4\n",
       "39             9  condition_code            9.099745          4.069767     5"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Вклады признаков внутри прототипов\n",
    "global_stats = {}\n",
    "for feat in X_trainval.columns:\n",
    "    if pd.api.types.is_numeric_dtype(X_trainval[feat]):\n",
    "        global_stats[feat] = {\n",
    "            \"type\": \"num\",\n",
    "            \"mean\": float(X_trainval[feat].mean()),\n",
    "            \"std\": float(X_trainval[feat].std(ddof=0) if X_trainval[feat].std(ddof=0) > 1e-8 else 1.0)\n",
    "        }\n",
    "    else:\n",
    "        global_mode = X_trainval[feat].mode(dropna=True)\n",
    "        global_stats[feat] = {\n",
    "            \"type\": \"cat\",\n",
    "            \"mode\": global_mode.iloc[0] if len(global_mode) else np.nan,\n",
    "            \"freq\": float((X_trainval[feat] == (global_mode.iloc[0] if len(global_mode) else np.nan)).mean()) if len(global_mode) else np.nan\n",
    "        }\n",
    "\n",
    "contrib_rows = []\n",
    "important_proto_ids = prototype_df.head(min(8, len(prototype_df)))[\"prototype_id\"].tolist()\n",
    "\n",
    "for pid in important_proto_ids:\n",
    "    mask = assign_idx_trainval == pid\n",
    "    subset_X = X_trainval.loc[mask]\n",
    "    if len(subset_X) == 0:\n",
    "        continue\n",
    "\n",
    "    proto_weight_abs = abs(coef[pid])\n",
    "\n",
    "    feat_rows = []\n",
    "    for feat in X_trainval.columns:\n",
    "        if pd.api.types.is_numeric_dtype(subset_X[feat]):\n",
    "            proto_mean = float(subset_X[feat].mean())\n",
    "            score = abs(proto_mean - global_stats[feat][\"mean\"]) / global_stats[feat][\"std\"]\n",
    "            interpreted_value = proto_mean\n",
    "        else:\n",
    "            mode_series = subset_X[feat].mode(dropna=True)\n",
    "            proto_mode = mode_series.iloc[0] if len(mode_series) else np.nan\n",
    "            proto_freq = float((subset_X[feat] == proto_mode).mean()) if pd.notna(proto_mode) else 0.0\n",
    "            global_freq = float((X_trainval[feat] == proto_mode).mean()) if pd.notna(proto_mode) else 0.0\n",
    "            score = abs(proto_freq - global_freq)\n",
    "            interpreted_value = proto_mode\n",
    "\n",
    "        feat_rows.append({\n",
    "            \"prototype_id\": pid,\n",
    "            \"feature\": feat,\n",
    "            \"contribution_score\": float(score * proto_weight_abs),\n",
    "            \"interpreted_value\": interpreted_value\n",
    "        })\n",
    "\n",
    "    feat_df = pd.DataFrame(feat_rows).sort_values(\"contribution_score\", ascending=False).head(5).reset_index(drop=True)\n",
    "    feat_df[\"rank\"] = np.arange(1, len(feat_df) + 1)\n",
    "    contrib_rows.append(feat_df)\n",
    "\n",
    "prototype_contrib_df = pd.concat(contrib_rows, ignore_index=True) if contrib_rows else pd.DataFrame()\n",
    "\n",
    "print(\"Топ-5 вкладов признаков для основных прототипов:\")\n",
    "display(prototype_contrib_df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "90a44731",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAABQkAAAHqCAYAAACnYcjKAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAMbZJREFUeJzt3XuU1XW9//HXADKIBKgoCHGc8oYK4oWD4S3NSbxRaiqJR5FMV6nraGQpaSCVIhUczExKJTwpRV5zZVFGkqmkiaLp8ZIXgl8JQhYoJujM/v3hcmoC1OEyA/N5PNbaq+a7v9/9/WxWa693z733d1dVKpVKAAAAAIBitWnpBQAAAAAALUskBAAAAIDCiYQAAAAAUDiREAAAAAAKJxICAAAAQOFEQgAAAAAonEgIAAAAAIUTCQEAAACgcCIhAAAAABROJAQAAACAwomEwCZn6tSpqaqqWu3twgsvTJL88pe/zOmnn56+ffumbdu2qampadlFAwCwXvz7LNiuXbv06tUrp512Wv785z+vsn+lUskPfvCDHHTQQenatWs6duyYfv365Stf+UqWL1++xvPcdtttOeKII9KtW7e0b98+PXv2zIknnphf//rXTVrvwIEDU1VVlauvvnq1919yySWpqqrKkiVLVnt/3759c/DBB6+yfdmyZRk7dmz69++fTp06ZfPNN0/fvn1zwQUX5C9/+UuT1giQJO1aegEAa+srX/lKPvCBDzTa1rdv3yTJtGnTMn369Oy9997p2bNnSywPAIAN6O1Z8PXXX8/vfve7TJ06Nffee28ef/zxdOjQIUlSV1eXYcOG5cc//nEOPPDAXHLJJenYsWN++9vfZuzYsbnpppvyq1/9Kt27d2943Eqlkk996lOZOnVq9tprr4wcOTI9evTIiy++mNtuuy2HHnpo7rvvvuy3337vusY//vGP+f3vf5+amprceOON+exnP7tenvvzzz+f2trazJ8/PyeccELOPPPMtG/fPo899liuu+663HbbbXnmmWfWy7mAcoiEwCbriCOOyIABA1Z732WXXZZrrrkmm222WY4++ug8/vjjzby6Dau+vj4rV65sGIABAErzr7Pgpz/96XTr1i3jx4/PHXfckRNPPDFJ8vWvfz0//vGPc/755+cb3/hGw7FnnnlmTjzxxBxzzDE57bTT8vOf/7zhvgkTJmTq1Kk577zzMnHixFRVVTXcd9FFF+UHP/hB2rV7b/9X+oYbbsi2226bCRMm5Pjjj8+8efPW+Rsub775Zo477rgsWrQos2bNygEHHNDo/ksvvTTjx49fp3MAZfJ1Y6BV6tmzZzbbbLO1Pv7tr30888wz+a//+q906dIl22yzTb785S+nUqlkwYIF+fjHP57OnTunR48emTBhQqPjV65cmdGjR2efffZJly5dssUWW+TAAw/M3Xffvcq56uvrc8UVV6Rfv37p0KFDttlmmxx++OF56KGHGvapqqrKOeeckxtvvDG77757qqurM2PGjCTJI488kiOOOCKdO3dOp06dcuihh+Z3v/vdWj93AIBN0YEHHpgkee6555Ik//jHP/KNb3wjO++8c8aNG7fK/kOGDMnw4cMzY8aMhtnpH//4R8aNG5c+ffrkm9/8ZqNA+LZTTjklAwcOfE9rmjZtWo4//vgcffTR6dKlS6ZNm7a2T6/BLbfckkcffTQXXXTRKoEwSTp37pxLL710nc8DlEckBDZZS5cuzZIlSxrd1rehQ4emvr4+l19+efbdd9987Wtfy6RJk/LRj340vXr1yvjx47Pjjjvm/PPPzz333NNw3LJly3Lttdfm4IMPzvjx43PJJZdk8eLFGTx4cObOndvoHKeffnrOO++89O7dO+PHj8+FF16YDh06rBL6fv3rX+dzn/tchg4dmiuuuCI1NTV54okncuCBB+bRRx/NF7/4xXz5y1/OCy+8kIMPPjgPPPDAev/3AADYWM2bNy9JsuWWWyZJ7r333vztb3/LsGHD1vjJv1NPPTVJ8tOf/rThmJdffjnDhg1L27Zt12k9DzzwQJ599tmcdNJJad++fY477rjceOON6/SYSXLHHXckeStWAqxPvm4MbLJqa2tX2VapVNbrOQYOHJjvfve7Sd76WkpNTU0+//nPZ9y4cbnggguSJCeddFJ69uyZKVOm5KCDDkry1nA6b968tG/fvuGxzjjjjPTp0ydXXnllrrvuuiTJ3XffnalTp+a///u/c8UVVzTs+/nPf36V5/L000/nD3/4Q3bbbbeGbccee2zeeOON3HvvvfngBz+Y5K1hd5dddskXv/jF/OY3v1mv/x4AABuLt98wfv311/PAAw9k7Nixqa6uztFHH50k+b//+78kSf/+/df4GG/f9+STTzb6z379+q3z+m644Yb07t07+++/f5Lkk5/8ZKZMmZK5c+dmzz33XOvHffLJJ9OlS5f07t17ndcI8K9EQmCTddVVV2XnnXfeoOf49Kc/3fDf27ZtmwEDBuT//b//l9NPP71he9euXbPLLrvk+eefb7Tv2+8+19fX5+9//3vq6+szYMCAPPzwww373XLLLamqqsqYMWNWOfe/f73lwx/+cKNAWFdXl1/+8pc55phjGgJhkmy33XYZNmxYrrnmmixbtiydO3deh38BAICN07+/YVxTU5Mbbrgh73//+5Mkr7zySpLkfe973xof4+37li1b1ug/3+mY9+LNN9/M9OnTM3z48IaZ7iMf+Ui23Xbb3HjjjesUCZctW7bO6wNYHZEQ2GQNHDhwjT9c8l7U1dVl8eLFjbZttdVWjT799x//8R+N7u/SpUs6dOiQbt26rbL9r3/9a6Nt119/fSZMmJCnnnoqb7zxRsP2f/1F5ueeey49e/bMVltt9a7r/fdfcl68eHFee+217LLLLqvsu+uuu6a+vj4LFizI7rvv/q6PDQCwqXn7DeOlS5dmypQpueeee1JdXd1w/9sh7e1YuDr/HhLffnP1nY552zvNkr/85S+zePHiDBw4MM8++2zD/Yccckh++MMfZvz48WnT5r1f/etf3zzu3LlzozenAdYX1yQEirVgwYJst912jW73339/o31Wdy2aNV2f5l+/HnzDDTfktNNOyw477JDrrrsuM2bMyF133ZWPfOQjqa+vX6v1br755mt1HABAazRw4MDU1tbmE5/4RO6444707ds3w4YNy6uvvprkrTdNk+Sxxx5b42O8fd/b39bo06dPkuQPf/jDu57/nWbJt689eOKJJ2annXZquE2fPj1//vOfG10SpkOHDkne+tGU1Xnttdca9nl7jUuXLs2CBQvedY0ATSESAsXq0aNH7rrrrka3d7pmTVPcfPPN+eAHP5hbb701p5xySgYPHpza2tq8/vrrjfbbYYcd8pe//CUvv/xyk8+xzTbbpGPHjnn66adXue+pp55KmzZtXKsGAChC27ZtM27cuPzlL3/Jt7/97STJAQcckK5du2batGmpq6tb7XH/+7//myQN1zE84IADsuWWW+aHP/zhGo9525pmyeXLl+cnP/lJhg4dmptuummV23bbbdfoB0y23377JFntTPfaa69lwYIFDfskb/0qc/LWm9IA65NICBSrQ4cOqa2tbXR7+9fw1tXbnzb8108XPvDAA5k9e3aj/T7xiU+kUqlk7NixqzzGu/0IS9u2bXPYYYflJz/5ScOv+SXJokWLMm3atBxwwAGuRwgAFOPggw/OwIEDM2nSpLz++uvp2LFjzj///Dz99NO56KKLVtn/zjvvzNSpUzN48OB86EMfSpJ07NgxF1xwQZ588slccMEFq53Hbrjhhjz44INrnCVvu+22LF++PGeffXaOP/74VW5HH310brnllqxYsSJJcuihh6Z9+/a5+uqrV/nGyfe+9728+eabOeKIIxq2HX/88enXr18uvfTSVWbL5K2vSq/u+QK8G9ckBFqlxx57LHfccUeS5Nlnn83SpUvzta99Lclbv2L39juwG8rRRx+dW2+9Nccee2yOOuqovPDCC5k8eXJ22223hq/AJG9dl+aUU07Jt771rfzxj3/M4Ycfnvr6+vz2t7/NIYccknPOOecdz/O1r30td911Vw444ICcddZZadeuXb773e9mxYoV+frXv75BnyMAwMbmC1/4Qk444YRMnTo1n/nMZ3LhhRfmkUceyfjx4zN79ux84hOfyOabb5577703N9xwQ3bddddcf/31qzzGE088kQkTJuTuu+/O8ccfnx49emThwoW5/fbb8+CDD65yiZp/deONN2brrbfOfvvtt9r7P/axj+Waa67JnXfemeOOOy7bbrttRo8enYsvvjgHHXRQPvaxj6Vjx465//7788Mf/jCHHXZYo9l1s802y6233pra2tocdNBBOfHEE7P//vtns802yxNPPJFp06Zlyy23zKWXXrp+/lGBclQANjHf//73K0kqv//97991n9Xdhg8f/q7nGDNmTCVJZfHixY22Dx8+vLLFFlussv+HP/zhyu67797wd319feWyyy6rbL/99pXq6urKXnvtVfnpT39aGT58eGX77bdvdOybb75Z+cY3vlHp06dPpX379pVtttmmcsQRR1TmzJnTsE+Sytlnn73atT788MOVwYMHVzp16lTp2LFj5ZBDDqncf//97/ocAQA2Re80C9bV1VV22GGHyg477FB58803G7Z9//vfr+y///6Vzp07Vzp06FDZfffdK2PHjq28+uqrazzPzTffXDnssMMqW221VaVdu3aV7bbbrjJ06NDKrFmz1njMokWLKu3atauccsopa9zntddeq3Ts2LFy7LHHNtp+ww03VD70oQ9Vtthii0p1dXWlT58+lbFjx1Zef/311T7O3/72t8ro0aMr/fr1q3Ts2LHSoUOHSt++fSujRo2qvPjii2s8P8CaVFUq7/J9NgAAAACgVXNNQgAAAAAonEgIAAAAAIUTCQEAAACgcE2OhPfcc0+GDBmSnj17pqqqKrfffvu7HjNr1qzsvffeqa6uzo477pipU6euxVIBAGhJ5kAAgNaryZFw+fLl6d+/f6666qr3tP8LL7yQo446Koccckjmzp2b8847L5/+9Kfzi1/8osmLBQCg5ZgDAQBar3X6deOqqqrcdtttOeaYY9a4zwUXXJA777wzjz/+eMO2T37yk/n73/+eGTNmrO2pAQBoQeZAAIDWpd2GPsHs2bNTW1vbaNvgwYNz3nnnrfGYFStWZMWKFQ1/19fX5+WXX87WW2+dqqqqDbVUAIBWq1Kp5JVXXknPnj3Tpk3zXJbaHAgA0PLe6xy4wSPhwoUL071790bbunfvnmXLluUf//hHNt9881WOGTduXMaOHbuhlwYAUJwFCxbk/e9/f7OcyxwIALDxeLc5cINHwrUxatSojBw5suHvpUuX5j/+4z+yYMGCdO7cuQVXBgCwaVq2bFl69+6d973vfS29lHdkDgQAWL/e6xy4wSNhjx49smjRokbbFi1alM6dO6/23eMkqa6uTnV19SrbO3fubDgEAFgHzfmV3U1xDqy58M4Nfg5gw5l3+VEtvYRm5TULNm3N/Zr1bnPgBr8gzaBBgzJz5sxG2+66664MGjRoQ58aAIAWZA4EANh0NDkSvvrqq5k7d27mzp2bJHnhhRcyd+7czJ8/P8lbXxE59dRTG/b/zGc+k+effz5f/OIX89RTT+U73/lOfvzjH+dzn/vc+nkGAAA0C3MgAEDr1eRI+NBDD2WvvfbKXnvtlSQZOXJk9tprr4wePTpJ8uKLLzYMiknygQ98IHfeeWfuuuuu9O/fPxMmTMi1116bwYMHr6enAABAczAHAgC0XlWVSqXS0ot4N8uWLUuXLl2ydOlS1yQEAFgLm+o81dzrdn0v2LS5JiGwKWmu16z3Ok9t8GsSAgAAAAAbN5EQAAAAAAonEgIAAABA4URCAAAAACicSAgAAAAAhRMJAQAAAKBwIiEAAAAAFE4kBAAAAIDCiYQAAAAAUDiREAAAAAAKJxICAAAAQOFEQgAAAAAonEgIAAAAAIUTCQEAAACgcCIhAAAAABROJAQAAACAwomEAAAAAFA4kRAAAAAACicSAgAAAEDhREIAAAAAKJxICAAAAACFEwkBAAAAoHAiIQAAAAAUTiQEAAAAgMKJhAAAAABQOJEQAAAAAAonEgIAAABA4URCAAAAACicSAgAAAAAhRMJAQAAAKBwIiEAAAAAFE4kBAAAAIDCiYQAAAAAUDiREAAAAAAKJxICAAAAQOFEQgAAAAAonEgIAAAAAIUTCQEAAACgcCIhAAAAABROJAQAAACAwomEAAAAAFA4kRAAAAAACicSAgAAAEDhREIAAAAAKJxICAAAAACFEwkBAAAAoHAiIQAAAAAUTiQEAAAAgMKJhAAAAABQOJEQAAAAAAonEgIAAABA4URCAAAAACicSAgAAAAAhRMJAQAAAKBwIiEAAAAAFE4kBAAAAIDCiYQAAAAAUDiREAAAAAAKJxICAAAAQOFEQgAAAAAonEgIAAAAAIUTCQEAAACgcCIhAAAAABROJAQAAACAwomEAAAAAFA4kRAAAAAACicSAgAAAEDhREIAAAAAKJxICAAAAACFEwkBAAAAoHAiIQAAAAAUTiQEAAAAgMKJhAAAAABQOJEQAAAAAAq3VpHwqquuSk1NTTp06JB99903Dz744DvuP2nSpOyyyy7ZfPPN07t373zuc5/L66+/vlYLBgCg5ZgDAQBapyZHwunTp2fkyJEZM2ZMHn744fTv3z+DBw/OSy+9tNr9p02blgsvvDBjxozJk08+meuuuy7Tp0/Pl770pXVePAAAzcccCADQejU5Ek6cODFnnHFGRowYkd122y2TJ09Ox44dM2XKlNXuf//992f//ffPsGHDUlNTk8MOOywnnXTSu77rDADAxsUcCADQejUpEq5cuTJz5sxJbW3tPx+gTZvU1tZm9uzZqz1mv/32y5w5cxqGweeffz4/+9nPcuSRR67xPCtWrMiyZcsa3QAAaDnmQACA1q1dU3ZesmRJ6urq0r1790bbu3fvnqeeemq1xwwbNixLlizJAQcckEqlkjfffDOf+cxn3vFrJuPGjcvYsWObsjQAADYgcyAAQOu2wX/deNasWbnsssvyne98Jw8//HBuvfXW3HnnnfnqV7+6xmNGjRqVpUuXNtwWLFiwoZcJAMB6Zg4EANh0NOmThN26dUvbtm2zaNGiRtsXLVqUHj16rPaYL3/5yznllFPy6U9/OknSr1+/LF++PGeeeWYuuuiitGmzaqesrq5OdXV1U5YGAMAGZA4EAGjdmvRJwvbt22efffbJzJkzG7bV19dn5syZGTRo0GqPee2111YZANu2bZskqVQqTV0vAAAtwBwIANC6NemThEkycuTIDB8+PAMGDMjAgQMzadKkLF++PCNGjEiSnHrqqenVq1fGjRuXJBkyZEgmTpyYvfbaK/vuu2+effbZfPnLX86QIUMahkQAADZ+5kAAgNaryZFw6NChWbx4cUaPHp2FCxdmzz33zIwZMxouYj1//vxG7xhffPHFqaqqysUXX5w///nP2WabbTJkyJBceuml6+9ZAACwwZkDAQBar6rKJvBdj2XLlqVLly5ZunRpOnfu3NLLAQDY5Gyq81Rzr7vmwjs3+DmADWfe5Ue19BKaldcs2LQ112vWe52nNvivGwMAAAAAGzeREAAAAAAKJxICAAAAQOFEQgAAAAAonEgIAAAAAIUTCQEAAACgcCIhAAAAABROJAQAAACAwomEAAAAAFA4kRAAAAAACicSAgAAAEDhREIAAAAAKJxICAAAAACFEwkBAAAAoHAiIQAAAAAUTiQEAAAAgMKJhAAAAABQOJEQAAAAAAonEgIAAABA4URCAAAAACicSAgAAAAAhRMJAQAAAKBwIiEAAAAAFE4kBAAAAIDCiYQAAAAAUDiREAAAAAAKJxICAAAAQOFEQgAAAAAonEgIAAAAAIUTCQEAAACgcCIhAAAAABROJAQAAACAwomEAAAAAFA4kRAAAAAACicSAgAAAEDhREIAAAAAKJxICAAAAACFEwkBAAAAoHAiIQAAAAAUTiQEAAAAgMKJhAAAAABQOJEQAAAAAAonEgIAAABA4URCAAAAACicSAgAAAAAhRMJAQAAAKBwIiEAAAAAFE4kBAAAAIDCiYQAAAAAUDiREAAAAAAKJxICAAAAQOFEQgAAAAAonEgIAAAAAIUTCQEAAACgcCIhAAAAABROJAQAAACAwomEAAAAAFA4kRAAAAAACicSAgAAAEDhREIAAAAAKJxICAAAAACFEwkBAAAAoHAiIQAAAAAUTiQEAAAAgMKJhAAAAABQOJEQAAAAAAonEgIAAABA4URCAAAAACicSAgAAAAAhRMJAQAAAKBwIiEAAAAAFE4kBAAAAIDCiYQAAAAAULi1ioRXXXVVampq0qFDh+y777558MEH33H/v//97zn77LOz3Xbbpbq6OjvvvHN+9rOfrdWCAQBoOeZAAIDWqV1TD5g+fXpGjhyZyZMnZ999982kSZMyePDgPP3009l2221X2X/lypX56Ec/mm233TY333xzevXqlT/96U/p2rXr+lg/AADNxBwIANB6NTkSTpw4MWeccUZGjBiRJJk8eXLuvPPOTJkyJRdeeOEq+0+ZMiUvv/xy7r///my22WZJkpqamnVbNQAAzc4cCADQejXp68YrV67MnDlzUltb+88HaNMmtbW1mT179mqPueOOOzJo0KCcffbZ6d69e/r27ZvLLrssdXV167ZyAACajTkQAKB1a9InCZcsWZK6urp079690fbu3bvnqaeeWu0xzz//fH7961/n5JNPzs9+9rM8++yzOeuss/LGG29kzJgxqz1mxYoVWbFiRcPfy5Yta8oyAQBYz8yBAACt2wb/deP6+vpsu+22+d73vpd99tknQ4cOzUUXXZTJkyev8Zhx48alS5cuDbfevXtv6GUCALCemQMBADYdTYqE3bp1S9u2bbNo0aJG2xctWpQePXqs9pjtttsuO++8c9q2bduwbdddd83ChQuzcuXK1R4zatSoLF26tOG2YMGCpiwTAID1zBwIANC6NSkStm/fPvvss09mzpzZsK2+vj4zZ87MoEGDVnvM/vvvn2effTb19fUN25555plst912ad++/WqPqa6uTufOnRvdAABoOeZAAIDWrclfNx45cmSuueaaXH/99XnyySfz2c9+NsuXL2/4lbtTTz01o0aNatj/s5/9bF5++eWce+65eeaZZ3LnnXfmsssuy9lnn73+ngUAABucORAAoPVq0g+XJMnQoUOzePHijB49OgsXLsyee+6ZGTNmNFzEev78+WnT5p/tsXfv3vnFL36Rz33uc9ljjz3Sq1evnHvuubngggvW37NYz2ouvLOllwCsg3mXH9XSS2hWXrNg07WpvV6VMAcCAJSqyZEwSc4555ycc845q71v1qxZq2wbNGhQfve7363NqQAA2IiYAwEAWqcN/uvGAAAAAMDGTSQEAAAAgMKJhAAAAABQOJEQAAAAAAonEgIAAABA4URCAAAAACicSAgAAAAAhRMJAQAAAKBwIiEAAAAAFE4kBAAAAIDCiYQAAAAAUDiREAAAAAAKJxICAAAAQOFEQgAAAAAonEgIAAAAAIUTCQEAAACgcCIhAAAAABROJAQAAACAwomEAAAAAFA4kRAAAAAACicSAgAAAEDhREIAAAAAKJxICAAAAACFEwkBAAAAoHAiIQAAAAAUTiQEAAAAgMKJhAAAAABQOJEQAAAAAAonEgIAAABA4URCAAAAACicSAgAAAAAhRMJAQAAAKBwIiEAAAAAFE4kBAAAAIDCiYQAAAAAUDiREAAAAAAKJxICAAAAQOFEQgAAAAAonEgIAAAAAIUTCQEAAACgcCIhAAAAABROJAQAAACAwomEAAAAAFA4kRAAAAAACicSAgAAAEDhREIAAAAAKJxICAAAAACFEwkBAAAAoHAiIQAAAAAUTiQEAAAAgMKJhAAAAABQOJEQAAAAAAonEgIAAABA4URCAAAAACicSAgAAAAAhRMJAQAAAKBwIiEAAAAAFE4kBAAAAIDCiYQAAAAAUDiREAAAAAAKJxICAAAAQOFEQgAAAAAonEgIAAAAAIUTCQEAAACgcCIhAAAAABROJAQAAACAwomEAAAAAFA4kRAAAAAACicSAgAAAEDhREIAAAAAKJxICAAAAACFEwkBAAAAoHAiIQAAAAAUTiQEAAAAgMKtVSS86qqrUlNTkw4dOmTffffNgw8++J6O+9GPfpSqqqocc8wxa3NaAAA2AmZBAIDWp8mRcPr06Rk5cmTGjBmThx9+OP3798/gwYPz0ksvveNx8+bNy/nnn58DDzxwrRcLAEDLMgsCALROTY6EEydOzBlnnJERI0Zkt912y+TJk9OxY8dMmTJljcfU1dXl5JNPztixY/PBD35wnRYMAEDLMQsCALROTYqEK1euzJw5c1JbW/vPB2jTJrW1tZk9e/Yaj/vKV76SbbfdNqeffvrarxQAgBZlFgQAaL3aNWXnJUuWpK6uLt27d2+0vXv37nnqqadWe8y9996b6667LnPnzn3P51mxYkVWrFjR8PeyZcuaskwAADaA5pgFzYEAAC1jg/668SuvvJJTTjkl11xzTbp16/aejxs3bly6dOnScOvdu/cGXCUAABvC2syC5kAAgJbRpE8SduvWLW3bts2iRYsabV+0aFF69Oixyv7PPfdc5s2blyFDhjRsq6+vf+vE7drl6aefzg477LDKcaNGjcrIkSMb/l62bJkBEQCghTXHLGgOBABoGU2KhO3bt88+++yTmTNn5phjjkny1qA3c+bMnHPOOavs36dPn/zhD39otO3iiy/OK6+8kiuuuGKNA191dXWqq6ubsjQAADaw5pgFzYEAAC2jSZEwSUaOHJnhw4dnwIABGThwYCZNmpTly5dnxIgRSZJTTz01vXr1yrhx49KhQ4f07du30fFdu3ZNklW2AwCw8TMLAgC0Tk2OhEOHDs3ixYszevToLFy4MHvuuWdmzJjRcAHr+fPnp02bDXqpQwAAWohZEACgdWpyJEySc845Z7VfKUmSWbNmveOxU6dOXZtTAgCwkTALAgC0Pt7mBQAAAIDCiYQAAAAAUDiREAAAAAAKJxICAAAAQOFEQgAAAAAonEgIAAAAAIUTCQEAAACgcCIhAAAAABROJAQAAACAwomEAAAAAFA4kRAAAAAACicSAgAAAEDhREIAAAAAKJxICAAAAACFEwkBAAAAoHAiIQAAAAAUTiQEAAAAgMKJhAAAAABQOJEQAAAAAAonEgIAAABA4URCAAAAACicSAgAAAAAhRMJAQAAAKBwIiEAAAAAFE4kBAAAAIDCiYQAAAAAUDiREAAAAAAKJxICAAAAQOFEQgAAAAAonEgIAAAAAIUTCQEAAACgcCIhAAAAABROJAQAAACAwomEAAAAAFA4kRAAAAAACicSAgAAAEDhREIAAAAAKJxICAAAAACFEwkBAAAAoHAiIQAAAAAUTiQEAAAAgMKJhAAAAABQOJEQAAAAAAonEgIAAABA4URCAAAAACicSAgAAAAAhRMJAQAAAKBwIiEAAAAAFE4kBAAAAIDCiYQAAAAAUDiREAAAAAAKJxICAAAAQOFEQgAAAAAonEgIAAAAAIUTCQEAAACgcCIhAAAAABROJAQAAACAwomEAAAAAFA4kRAAAAAACicSAgAAAEDhREIAAAAAKJxICAAAAACFEwkBAAAAoHAiIQAAAAAUTiQEAAAAgMKJhAAAAABQOJEQAAAAAAonEgIAAABA4URCAAAAACicSAgAAAAAhRMJAQAAAKBwIiEAAAAAFE4kBAAAAIDCiYQAAAAAULi1ioRXXXVVampq0qFDh+y777558MEH17jvNddckwMPPDBbbrllttxyy9TW1r7j/gAAbNzMggAArU+TI+H06dMzcuTIjBkzJg8//HD69++fwYMH56WXXlrt/rNmzcpJJ52Uu+++O7Nnz07v3r1z2GGH5c9//vM6Lx4AgOZlFgQAaJ2aHAknTpyYM844IyNGjMhuu+2WyZMnp2PHjpkyZcpq97/xxhtz1llnZc8990yfPn1y7bXXpr6+PjNnzlznxQMA0LzMggAArVOTIuHKlSszZ86c1NbW/vMB2rRJbW1tZs+e/Z4e47XXXssbb7yRrbbaao37rFixIsuWLWt0AwCgZTXHLGgOBABoGU2KhEuWLEldXV26d+/eaHv37t2zcOHC9/QYF1xwQXr27NlouPx348aNS5cuXRpuvXv3bsoyAQDYAJpjFjQHAgC0jGb9dePLL788P/rRj3LbbbelQ4cOa9xv1KhRWbp0acNtwYIFzbhKAAA2hPcyC5oDAQBaRrum7NytW7e0bds2ixYtarR90aJF6dGjxzse+81vfjOXX355fvWrX2WPPfZ4x32rq6tTXV3dlKUBALCBNccsaA4EAGgZTfokYfv27bPPPvs0utD02xeeHjRo0BqP+/rXv56vfvWrmTFjRgYMGLD2qwUAoMWYBQEAWq8mfZIwSUaOHJnhw4dnwIABGThwYCZNmpTly5dnxIgRSZJTTz01vXr1yrhx45Ik48ePz+jRozNt2rTU1NQ0XK+mU6dO6dSp03p8KgAAbGhmQQCA1qnJkXDo0KFZvHhxRo8enYULF2bPPffMjBkzGi5gPX/+/LRp888PKF599dVZuXJljj/++EaPM2bMmFxyySXrtnoAAJqVWRAAoHVqciRMknPOOSfnnHPOau+bNWtWo7/nzZu3NqcAAGAjZRYEAGh9mvXXjQEAAACAjY9ICAAAAACFEwkBAAAAoHAiIQAAAAAUTiQEAAAAgMKJhAAAAABQOJEQAAAAAAonEgIAAABA4URCAAAAACicSAgAAAAAhRMJAQAAAKBwIiEAAAAAFE4kBAAAAIDCiYQAAAAAUDiREAAAAAAKJxICAAAAQOFEQgAAAAAonEgIAAAAAIUTCQEAAACgcCIhAAAAABROJAQAAACAwomEAAAAAFA4kRAAAAAACicSAgAAAEDhREIAAAAAKJxICAAAAACFEwkBAAAAoHAiIQAAAAAUTiQEAAAAgMKJhAAAAABQOJEQAAAAAAonEgIAAABA4URCAAAAACicSAgAAAAAhRMJAQAAAKBwIiEAAAAAFE4kBAAAAIDCiYQAAAAAUDiREAAAAAAKJxICAAAAQOFEQgAAAAAonEgIAAAAAIUTCQEAAACgcCIhAAAAABROJAQAAACAwomEAAAAAFA4kRAAAAAACicSAgAAAEDhREIAAAAAKJxICAAAAACFEwkBAAAAoHAiIQAAAAAUTiQEAAAAgMKJhAAAAABQOJEQAAAAAAonEgIAAABA4URCAAAAACicSAgAAAAAhRMJAQAAAKBwIiEAAAAAFE4kBAAAAIDCiYQAAAAAUDiREAAAAAAKJxICAAAAQOFEQgAAAAAonEgIAAAAAIUTCQEAAACgcCIhAAAAABROJAQAAACAwomEAAAAAFA4kRAAAAAACicSAgAAAEDhREIAAAAAKJxICAAAAACFW6tIeNVVV6WmpiYdOnTIvvvumwcffPAd97/pppvSp0+fdOjQIf369cvPfvaztVosAAAtzywIAND6NDkSTp8+PSNHjsyYMWPy8MMPp3///hk8eHBeeuml1e5///3356STTsrpp5+eRx55JMccc0yOOeaYPP744+u8eAAAmpdZEACgdWpyJJw4cWLOOOOMjBgxIrvttlsmT56cjh07ZsqUKavd/4orrsjhhx+eL3zhC9l1113z1a9+NXvvvXe+/e1vr/PiAQBoXmZBAIDWqV1Tdl65cmXmzJmTUaNGNWxr06ZNamtrM3v27NUeM3v27IwcObLRtsGDB+f2229f43lWrFiRFStWNPy9dOnSJMmyZcuasty1Vr/itWY5D7BhNNdrxcbCaxZsuprz9ertc1UqlbV+jOaYBc2BwLowBwKbkuZ6zXqvc2CTIuGSJUtSV1eX7t27N9revXv3PPXUU6s9ZuHChavdf+HChWs8z7hx4zJ27NhVtvfu3bspywUK1WVSS68A4L1piderV155JV26dFmrY5tjFjQHAuvCHAhsSpr7Nevd5sAmRcLmMmrUqEbvONfX1+fll1/O1ltvnaqqqhZcGZu6ZcuWpXfv3lmwYEE6d+7c0ssBeEdes1ifKpVKXnnllfTs2bOll/KOzIFsSF5XgU2J1yzWl/c6BzYpEnbr1i1t27bNokWLGm1ftGhRevTosdpjevTo0aT9k6S6ujrV1dWNtnXt2rUpS4V31LlzZy+ywCbDaxbry9p+gvBtzTELmgNpDl5XgU2J1yzWh/cyBzbph0vat2+fffbZJzNnzmzYVl9fn5kzZ2bQoEGrPWbQoEGN9k+Su+66a437AwCwcTILAgC0Xk3+uvHIkSMzfPjwDBgwIAMHDsykSZOyfPnyjBgxIkly6qmnplevXhk3blyS5Nxzz82HP/zhTJgwIUcddVR+9KMf5aGHHsr3vve99ftMAADY4MyCAACtU5Mj4dChQ7N48eKMHj06CxcuzJ577pkZM2Y0XJB6/vz5adPmnx9Q3G+//TJt2rRcfPHF+dKXvpSddtopt99+e/r27bv+ngW8R9XV1RkzZswqX2MC2Bh5zWJjZBZkU+Z1FdiUeM2iuVVV3u33jwEAAACAVq1J1yQEAAAAAFofkRAAAAAACicSAgAAAEDhREJarZqamkyaNKlZzzl16tR07dq1Wc8JAEBj5kAAaDqRkI3KwQcfnPPOO2+9PNbvf//7nHnmmevlsQDWl/X5Opckp512Wo455pj19ngALcUcCJTALMjGrF1LLwCaolKppK6uLu3avfv/dLfZZptmWBEAAM3BHAgAG5ZPErLROO200/Kb3/wmV1xxRaqqqlJVVZWpU6emqqoqP//5z7PPPvukuro69957b5577rl8/OMfT/fu3dOpU6f853/+Z371q181erx//5pJVVVVrr322hx77LHp2LFjdtppp9xxxx1Jkvr6+rz//e/P1Vdf3egxHnnkkbRp0yZ/+tOfkiQTJ05Mv379ssUWW6R3794566yz8uqrr27Yfxig1Vjd69y8efPy+OOP54gjjkinTp3SvXv3nHLKKVmyZEnDcTfffHP69euXzTffPFtvvXVqa2uzfPnyXHLJJbn++uvzk5/8pOHxZs2a1XJPEGAtmQOBEpgF2diJhGw0rrjiigwaNChnnHFGXnzxxbz44ovp3bt3kuTCCy/M5ZdfnieffDJ77LFHXn311Rx55JGZOXNmHnnkkRx++OEZMmRI5s+f/47nGDt2bE488cQ89thjOfLII3PyySfn5ZdfTps2bXLSSSdl2rRpjfa/8cYbs//++2f77bdPkrRp0ybf+ta38sQTT+T666/Pr3/963zxi1/cMP8gQKuzute5973vffnIRz6SvfbaKw899FBmzJiRRYsW5cQTT0ySvPjiiznppJPyqU99Kk8++WRmzZqV4447LpVKJeeff35OPPHEHH744Q2Pt99++7XwswRoOnMgUAKzIBu9CmxEPvzhD1fOPffchr/vvvvuSpLK7bff/q7H7r777pUrr7yy4e/tt9++8j//8z8NfyepXHzxxQ1/v/rqq5UklZ///OeVSqVSeeSRRypVVVWVP/3pT5VKpVKpq6ur9OrVq3L11Vev8Zw33XRTZeutt274+/vf/36lS5cu77pWoFz//jr31a9+tXLYYYc12mfBggWVJJWnn366MmfOnEqSyrx581b7eMOHD698/OMf34ArBmge5kCgBGZBNmY+ScgmYcCAAY3+fvXVV3P++edn1113TdeuXdOpU6c8+eST7/oO8h577NHw37fYYot07tw5L730UpJkzz33zK677trwLvJvfvObvPTSSznhhBMajvnVr36VQw89NL169cr73ve+nHLKKfnrX/+a1157bX09VaAwjz76aO6+++506tSp4danT58kyXPPPZf+/fvn0EMPTb9+/XLCCSfkmmuuyd/+9rcWXjVA8zEHAq2ZWZCNiUjIJmGLLbZo9Pf555+f2267LZdddll++9vfZu7cuenXr19Wrlz5jo+z2WabNfq7qqoq9fX1DX+ffPLJDcPhtGnTcvjhh2frrbdOksybNy9HH3109thjj9xyyy2ZM2dOrrrqqiR51/MCrMmrr76aIUOGZO7cuY1uf/zjH3PQQQelbdu2ueuuu/Lzn/88u+22W6688srssssueeGFF1p66QDNwhwItGZmQTYmIiEblfbt26euru5d97vvvvty2mmn5dhjj02/fv3So0ePzJs3b53PP2zYsDz++OOZM2dObr755px88skN982ZMyf19fWZMGFCPvShD2XnnXfOX/7yl3U+J1CWf3+d23vvvfPEE0+kpqYmO+64Y6Pb2//HuKqqKvvvv3/Gjh2bRx55JO3bt89tt9222scD2FSZA4ESmAXZmImEbFRqamrywAMPZN68eVmyZEmjd3f/1U477ZRbb701c+fOzaOPPpphw4atcd+mnn+//fbL6aefnrq6unzsYx9ruG/HHXfMG2+8kSuvvDLPP/98fvCDH2Ty5MnrfE6gLP/+Onf22Wfn5ZdfzkknnZTf//73ee655/KLX/wiI0aMSF1dXR544IFcdtlleeihhzJ//vzceuutWbx4cXbdddeGx3vsscfy9NNPZ8mSJXnjjTda+BkCrB1zIFACsyAbM5GQjcr555+ftm3bZrfddss222yzxmvLTJw4MVtuuWX222+/DBkyJIMHD87ee++9XtZw8skn59FHH82xxx6bzTffvGF7//79M3HixIwfPz59+/bNjTfemHHjxq2XcwLl+PfXuZUrV+a+++5LXV1dDjvssPTr1y/nnXdeunbtmjZt2qRz58655557cuSRR2bnnXfOxRdfnAkTJuSII45IkpxxxhnZZZddMmDAgGyzzTa57777WvgZAqwdcyBQArMgG7OqSqVSaelFAAAAAAAtxycJAQAAAKBwIiEAAAAAFE4kBAAAAIDCiYQAAAAAUDiREAAAAAAKJxICAAAAQOFEQgAAAAAonEgIAAAAAIUTCQEAAACgcCIhAAAAABROJAQAAACAwomEAAAAAFC4/w+Vfzk91BeMgwAAAABJRU5ErkJggg==",
      "text/plain": [
       "<Figure size 1300x500 with 2 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Иллюстрация качества\n",
    "fig, axes = plt.subplots(1, 2, figsize=(13, 5))\n",
    "\n",
    "axes[0].bar(results_df[\"split\"], results_df[\"f1_macro\"])\n",
    "axes[0].set_title(\"F1-macro\")\n",
    "axes[0].set_ylim(0, 1)\n",
    "\n",
    "axes[1].bar(results_df[\"split\"], results_df[\"roc_auc\"])\n",
    "axes[1].set_title(\"ROC-AUC\")\n",
    "axes[1].set_ylim(0, 1)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "63615e54",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Первые 10 фактических и предсказанных значений на test:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>object_name</th>\n",
       "      <th>y_true</th>\n",
       "      <th>y_pred</th>\n",
       "      <th>y_pred_proba</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>nan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000041</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>nan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000116</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>nan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000245</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>nan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000022</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>nan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000150</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>nan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000084</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>nan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000121</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Dinotina</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.001224</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>nan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000040</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>nan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000272</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  object_name  y_true  y_pred  y_pred_proba\n",
       "0         nan       0       0      0.000041\n",
       "1         nan       0       0      0.000116\n",
       "2         nan       0       0      0.000245\n",
       "3         nan       0       0      0.000022\n",
       "4         nan       0       0      0.000150\n",
       "5         nan       0       0      0.000084\n",
       "6         nan       0       0      0.000121\n",
       "7    Dinotina       0       0      0.001224\n",
       "8         nan       0       0      0.000040\n",
       "9         nan       0       0      0.000272"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Примеры предсказаний на test\n",
    "pred_df = pd.DataFrame({\n",
    "    \"object_name\": pd.Series(names_test).astype(str).values,\n",
    "    \"y_true\": y_test.values,\n",
    "    \"y_pred\": test_pred,\n",
    "    \"y_pred_proba\": test_proba\n",
    "})\n",
    "print(\"Первые 10 фактических и предсказанных значений на test:\")\n",
    "display(pred_df.head(10))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "5c82aace",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== КЛЮЧЕВАЯ ИНФОРМАЦИЯ ДЛЯ ВЫВОДА ===\n",
      "Зависимая переменная: pha\n",
      "\n",
      "Число исходных признаков: 18\n",
      "Число признаков после кодирования: 30\n",
      "Лучшая конфигурация прототипной сети:\n",
      "n_prototypes_per_class = 5\n",
      "total_prototypes = 10\n",
      "sigma_multiplier = 2.0\n",
      "sigma = 6.271090984344482\n",
      "C = 2.0\n",
      "\n",
      "Распределение классов:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>share</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>pha</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.997552</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.002448</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        share\n",
       "pha          \n",
       "0    0.997552\n",
       "1    0.002448"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Использованные исходные признаки:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>feature</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>e</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>i</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>om</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>w</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>q</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>ad</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>per_y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>data_arc</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>condition_code</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>n_obs_used</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>h</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>moid</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>n</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>per</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>ma</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>neo</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>class</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           feature\n",
       "0                a\n",
       "1                e\n",
       "2                i\n",
       "3               om\n",
       "4                w\n",
       "5                q\n",
       "6               ad\n",
       "7            per_y\n",
       "8         data_arc\n",
       "9   condition_code\n",
       "10      n_obs_used\n",
       "11               h\n",
       "12            moid\n",
       "13               n\n",
       "14             per\n",
       "15              ma\n",
       "16             neo\n",
       "17           class"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Метрики модели:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>split</th>\n",
       "      <th>accuracy</th>\n",
       "      <th>balanced_accuracy</th>\n",
       "      <th>precision_macro</th>\n",
       "      <th>recall_macro</th>\n",
       "      <th>f1_macro</th>\n",
       "      <th>roc_auc</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>trainval</td>\n",
       "      <td>0.979375</td>\n",
       "      <td>0.988644</td>\n",
       "      <td>0.553000</td>\n",
       "      <td>0.988644</td>\n",
       "      <td>0.590603</td>\n",
       "      <td>0.995785</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>test</td>\n",
       "      <td>0.979320</td>\n",
       "      <td>0.985546</td>\n",
       "      <td>0.552416</td>\n",
       "      <td>0.985546</td>\n",
       "      <td>0.589591</td>\n",
       "      <td>0.996248</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      split  accuracy  balanced_accuracy  precision_macro  recall_macro  \\\n",
       "0  trainval  0.979375           0.988644         0.553000      0.988644   \n",
       "1      test  0.979320           0.985546         0.552416      0.985546   \n",
       "\n",
       "   f1_macro   roc_auc  \n",
       "0  0.590603  0.995785  \n",
       "1  0.589591  0.996248  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Топ-10 конфигураций по validation:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>n_prototypes_per_class</th>\n",
       "      <th>total_prototypes</th>\n",
       "      <th>sigma_multiplier</th>\n",
       "      <th>sigma</th>\n",
       "      <th>C</th>\n",
       "      <th>val_accuracy</th>\n",
       "      <th>val_balanced_accuracy</th>\n",
       "      <th>val_precision_macro</th>\n",
       "      <th>val_recall_macro</th>\n",
       "      <th>val_f1_macro</th>\n",
       "      <th>val_roc_auc</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5</td>\n",
       "      <td>10</td>\n",
       "      <td>2.0</td>\n",
       "      <td>5.948137</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.97934</td>\n",
       "      <td>0.989645</td>\n",
       "      <td>0.552814</td>\n",
       "      <td>0.989645</td>\n",
       "      <td>0.590305</td>\n",
       "      <td>0.996080</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5</td>\n",
       "      <td>10</td>\n",
       "      <td>2.0</td>\n",
       "      <td>5.948137</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.97890</td>\n",
       "      <td>0.989424</td>\n",
       "      <td>0.551827</td>\n",
       "      <td>0.989424</td>\n",
       "      <td>0.588574</td>\n",
       "      <td>0.996067</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4</td>\n",
       "      <td>8</td>\n",
       "      <td>2.0</td>\n",
       "      <td>5.807996</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.97860</td>\n",
       "      <td>0.989274</td>\n",
       "      <td>0.551174</td>\n",
       "      <td>0.989274</td>\n",
       "      <td>0.587425</td>\n",
       "      <td>0.995346</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5</td>\n",
       "      <td>10</td>\n",
       "      <td>2.0</td>\n",
       "      <td>5.948137</td>\n",
       "      <td>0.5</td>\n",
       "      <td>0.97824</td>\n",
       "      <td>0.989093</td>\n",
       "      <td>0.550413</td>\n",
       "      <td>0.989093</td>\n",
       "      <td>0.586078</td>\n",
       "      <td>0.995906</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>8</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.903998</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.97822</td>\n",
       "      <td>0.984995</td>\n",
       "      <td>0.550031</td>\n",
       "      <td>0.984995</td>\n",
       "      <td>0.585391</td>\n",
       "      <td>0.995502</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>4</td>\n",
       "      <td>8</td>\n",
       "      <td>2.0</td>\n",
       "      <td>5.807996</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.97802</td>\n",
       "      <td>0.988983</td>\n",
       "      <td>0.549959</td>\n",
       "      <td>0.988983</td>\n",
       "      <td>0.585272</td>\n",
       "      <td>0.995145</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>3</td>\n",
       "      <td>6</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3.745638</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.97786</td>\n",
       "      <td>0.988903</td>\n",
       "      <td>0.549634</td>\n",
       "      <td>0.988903</td>\n",
       "      <td>0.584693</td>\n",
       "      <td>0.994570</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>4</td>\n",
       "      <td>8</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.903998</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.97770</td>\n",
       "      <td>0.988823</td>\n",
       "      <td>0.549313</td>\n",
       "      <td>0.988823</td>\n",
       "      <td>0.584120</td>\n",
       "      <td>0.995309</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>4</td>\n",
       "      <td>8</td>\n",
       "      <td>2.0</td>\n",
       "      <td>5.807996</td>\n",
       "      <td>0.5</td>\n",
       "      <td>0.97768</td>\n",
       "      <td>0.988813</td>\n",
       "      <td>0.549273</td>\n",
       "      <td>0.988813</td>\n",
       "      <td>0.584049</td>\n",
       "      <td>0.994914</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>3</td>\n",
       "      <td>6</td>\n",
       "      <td>2.0</td>\n",
       "      <td>7.491276</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.97758</td>\n",
       "      <td>0.988763</td>\n",
       "      <td>0.549075</td>\n",
       "      <td>0.988763</td>\n",
       "      <td>0.583695</td>\n",
       "      <td>0.993315</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   n_prototypes_per_class  total_prototypes  sigma_multiplier     sigma    C  \\\n",
       "0                       5                10               2.0  5.948137  2.0   \n",
       "1                       5                10               2.0  5.948137  1.0   \n",
       "2                       4                 8               2.0  5.807996  2.0   \n",
       "3                       5                10               2.0  5.948137  0.5   \n",
       "4                       4                 8               1.0  2.903998  2.0   \n",
       "5                       4                 8               2.0  5.807996  1.0   \n",
       "6                       3                 6               1.0  3.745638  2.0   \n",
       "7                       4                 8               1.0  2.903998  1.0   \n",
       "8                       4                 8               2.0  5.807996  0.5   \n",
       "9                       3                 6               2.0  7.491276  2.0   \n",
       "\n",
       "   val_accuracy  val_balanced_accuracy  val_precision_macro  val_recall_macro  \\\n",
       "0       0.97934               0.989645             0.552814          0.989645   \n",
       "1       0.97890               0.989424             0.551827          0.989424   \n",
       "2       0.97860               0.989274             0.551174          0.989274   \n",
       "3       0.97824               0.989093             0.550413          0.989093   \n",
       "4       0.97822               0.984995             0.550031          0.984995   \n",
       "5       0.97802               0.988983             0.549959          0.988983   \n",
       "6       0.97786               0.988903             0.549634          0.988903   \n",
       "7       0.97770               0.988823             0.549313          0.988823   \n",
       "8       0.97768               0.988813             0.549273          0.988813   \n",
       "9       0.97758               0.988763             0.549075          0.988763   \n",
       "\n",
       "   val_f1_macro  val_roc_auc  \n",
       "0      0.590305     0.996080  \n",
       "1      0.588574     0.996067  \n",
       "2      0.587425     0.995346  \n",
       "3      0.586078     0.995906  \n",
       "4      0.585391     0.995502  \n",
       "5      0.585272     0.995145  \n",
       "6      0.584693     0.994570  \n",
       "7      0.584120     0.995309  \n",
       "8      0.584049     0.994914  \n",
       "9      0.583695     0.993315  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Топ-10 признаков по важности:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>feature</th>\n",
       "      <th>permutation_importance_mean</th>\n",
       "      <th>permutation_importance_std</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>e</td>\n",
       "      <td>0.047664</td>\n",
       "      <td>0.002647</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>condition_code</td>\n",
       "      <td>0.007815</td>\n",
       "      <td>0.000350</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>data_arc</td>\n",
       "      <td>0.003683</td>\n",
       "      <td>0.000315</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>n</td>\n",
       "      <td>0.002203</td>\n",
       "      <td>0.003038</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>n_obs_used</td>\n",
       "      <td>0.001505</td>\n",
       "      <td>0.000192</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>ma</td>\n",
       "      <td>0.000422</td>\n",
       "      <td>0.000162</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>w</td>\n",
       "      <td>0.000026</td>\n",
       "      <td>0.000099</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>per</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>per_y</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>ad</td>\n",
       "      <td>-0.000026</td>\n",
       "      <td>0.000037</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          feature  permutation_importance_mean  permutation_importance_std\n",
       "0               e                     0.047664                    0.002647\n",
       "1  condition_code                     0.007815                    0.000350\n",
       "2        data_arc                     0.003683                    0.000315\n",
       "3               n                     0.002203                    0.003038\n",
       "4      n_obs_used                     0.001505                    0.000192\n",
       "5              ma                     0.000422                    0.000162\n",
       "6               w                     0.000026                    0.000099\n",
       "7             per                     0.000000                    0.000000\n",
       "8           per_y                     0.000000                    0.000000\n",
       "9              ad                    -0.000026                    0.000037"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Основные прототипы:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>prototype_id</th>\n",
       "      <th>prototype_class</th>\n",
       "      <th>support_n</th>\n",
       "      <th>support_share_trainval</th>\n",
       "      <th>prototype_output_weight</th>\n",
       "      <th>impact_score</th>\n",
       "      <th>assigned_positive_rate_trainval</th>\n",
       "      <th>nearest_objects</th>\n",
       "      <th>e</th>\n",
       "      <th>condition_code</th>\n",
       "      <th>data_arc</th>\n",
       "      <th>n</th>\n",
       "      <th>n_obs_used</th>\n",
       "      <th>ma</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>27675</td>\n",
       "      <td>0.138375</td>\n",
       "      <td>-22.340298</td>\n",
       "      <td>3.091339</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>nan, nan, nan</td>\n",
       "      <td>0.173780</td>\n",
       "      <td>6.712051</td>\n",
       "      <td>253.674230</td>\n",
       "      <td>0.237302</td>\n",
       "      <td>18.095537</td>\n",
       "      <td>181.147050</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>71765</td>\n",
       "      <td>0.358825</td>\n",
       "      <td>-3.436301</td>\n",
       "      <td>1.233031</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>nan, nan, nan</td>\n",
       "      <td>0.143826</td>\n",
       "      <td>0.187250</td>\n",
       "      <td>5927.571853</td>\n",
       "      <td>0.226685</td>\n",
       "      <td>199.988755</td>\n",
       "      <td>189.886354</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>48725</td>\n",
       "      <td>0.243625</td>\n",
       "      <td>-3.356772</td>\n",
       "      <td>0.817794</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>nan, nan, nan</td>\n",
       "      <td>0.155231</td>\n",
       "      <td>0.237537</td>\n",
       "      <td>5965.458902</td>\n",
       "      <td>0.233649</td>\n",
       "      <td>198.257712</td>\n",
       "      <td>162.908708</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>19161</td>\n",
       "      <td>0.095805</td>\n",
       "      <td>6.953126</td>\n",
       "      <td>0.666144</td>\n",
       "      <td>0.000052</td>\n",
       "      <td>nan, nan, nan</td>\n",
       "      <td>0.137637</td>\n",
       "      <td>0.206524</td>\n",
       "      <td>12629.802255</td>\n",
       "      <td>0.217945</td>\n",
       "      <td>1149.485048</td>\n",
       "      <td>176.631783</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>27221</td>\n",
       "      <td>0.136105</td>\n",
       "      <td>-3.970871</td>\n",
       "      <td>0.540455</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>nan, nan, nan</td>\n",
       "      <td>0.124196</td>\n",
       "      <td>0.236619</td>\n",
       "      <td>6007.014474</td>\n",
       "      <td>0.235234</td>\n",
       "      <td>199.583740</td>\n",
       "      <td>192.906197</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>2012</td>\n",
       "      <td>0.010060</td>\n",
       "      <td>10.944390</td>\n",
       "      <td>0.110101</td>\n",
       "      <td>0.086481</td>\n",
       "      <td>nan, nan, nan</td>\n",
       "      <td>0.594384</td>\n",
       "      <td>5.188867</td>\n",
       "      <td>1152.310966</td>\n",
       "      <td>0.264277</td>\n",
       "      <td>92.923956</td>\n",
       "      <td>172.768315</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7</td>\n",
       "      <td>1</td>\n",
       "      <td>1237</td>\n",
       "      <td>0.006185</td>\n",
       "      <td>11.830092</td>\n",
       "      <td>0.073169</td>\n",
       "      <td>0.099434</td>\n",
       "      <td>nan, nan, nan</td>\n",
       "      <td>0.457377</td>\n",
       "      <td>4.561035</td>\n",
       "      <td>1747.569672</td>\n",
       "      <td>0.455755</td>\n",
       "      <td>109.617623</td>\n",
       "      <td>176.214085</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "      <td>817</td>\n",
       "      <td>0.004085</td>\n",
       "      <td>8.297359</td>\n",
       "      <td>0.033895</td>\n",
       "      <td>0.083231</td>\n",
       "      <td>nan, nan, nan</td>\n",
       "      <td>0.370235</td>\n",
       "      <td>4.069767</td>\n",
       "      <td>2102.944717</td>\n",
       "      <td>0.508083</td>\n",
       "      <td>116.062424</td>\n",
       "      <td>174.708108</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>8</td>\n",
       "      <td>1</td>\n",
       "      <td>190</td>\n",
       "      <td>0.000950</td>\n",
       "      <td>7.401795</td>\n",
       "      <td>0.007032</td>\n",
       "      <td>0.178947</td>\n",
       "      <td>nan, nan, nan</td>\n",
       "      <td>0.410074</td>\n",
       "      <td>4.278947</td>\n",
       "      <td>1910.597884</td>\n",
       "      <td>1.374819</td>\n",
       "      <td>152.363158</td>\n",
       "      <td>172.634906</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "      <td>1197</td>\n",
       "      <td>0.005985</td>\n",
       "      <td>0.684461</td>\n",
       "      <td>0.004096</td>\n",
       "      <td>0.075188</td>\n",
       "      <td>nan, nan, nan</td>\n",
       "      <td>0.247385</td>\n",
       "      <td>5.035923</td>\n",
       "      <td>1103.224767</td>\n",
       "      <td>0.835589</td>\n",
       "      <td>86.230576</td>\n",
       "      <td>176.160290</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   prototype_id  prototype_class  support_n  support_share_trainval  \\\n",
       "0             3                0      27675                0.138375   \n",
       "1             0                0      71765                0.358825   \n",
       "2             4                0      48725                0.243625   \n",
       "3             2                0      19161                0.095805   \n",
       "4             1                0      27221                0.136105   \n",
       "5             5                1       2012                0.010060   \n",
       "6             7                1       1237                0.006185   \n",
       "7             9                1        817                0.004085   \n",
       "8             8                1        190                0.000950   \n",
       "9             6                1       1197                0.005985   \n",
       "\n",
       "   prototype_output_weight  impact_score  assigned_positive_rate_trainval  \\\n",
       "0               -22.340298      3.091339                         0.000000   \n",
       "1                -3.436301      1.233031                         0.000000   \n",
       "2                -3.356772      0.817794                         0.000000   \n",
       "3                 6.953126      0.666144                         0.000052   \n",
       "4                -3.970871      0.540455                         0.000000   \n",
       "5                10.944390      0.110101                         0.086481   \n",
       "6                11.830092      0.073169                         0.099434   \n",
       "7                 8.297359      0.033895                         0.083231   \n",
       "8                 7.401795      0.007032                         0.178947   \n",
       "9                 0.684461      0.004096                         0.075188   \n",
       "\n",
       "  nearest_objects         e  condition_code      data_arc         n  \\\n",
       "0   nan, nan, nan  0.173780        6.712051    253.674230  0.237302   \n",
       "1   nan, nan, nan  0.143826        0.187250   5927.571853  0.226685   \n",
       "2   nan, nan, nan  0.155231        0.237537   5965.458902  0.233649   \n",
       "3   nan, nan, nan  0.137637        0.206524  12629.802255  0.217945   \n",
       "4   nan, nan, nan  0.124196        0.236619   6007.014474  0.235234   \n",
       "5   nan, nan, nan  0.594384        5.188867   1152.310966  0.264277   \n",
       "6   nan, nan, nan  0.457377        4.561035   1747.569672  0.455755   \n",
       "7   nan, nan, nan  0.370235        4.069767   2102.944717  0.508083   \n",
       "8   nan, nan, nan  0.410074        4.278947   1910.597884  1.374819   \n",
       "9   nan, nan, nan  0.247385        5.035923   1103.224767  0.835589   \n",
       "\n",
       "    n_obs_used          ma  \n",
       "0    18.095537  181.147050  \n",
       "1   199.988755  189.886354  \n",
       "2   198.257712  162.908708  \n",
       "3  1149.485048  176.631783  \n",
       "4   199.583740  192.906197  \n",
       "5    92.923956  172.768315  \n",
       "6   109.617623  176.214085  \n",
       "7   116.062424  174.708108  \n",
       "8   152.363158  172.634906  \n",
       "9    86.230576  176.160290  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Топ-5 вкладов признаков для основных прототипов:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>prototype_id</th>\n",
       "      <th>feature</th>\n",
       "      <th>contribution_score</th>\n",
       "      <th>interpreted_value</th>\n",
       "      <th>rank</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>condition_code</td>\n",
       "      <td>47.354488</td>\n",
       "      <td>6.712051</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3</td>\n",
       "      <td>data_arc</td>\n",
       "      <td>29.003140</td>\n",
       "      <td>253.67423</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>h</td>\n",
       "      <td>15.000282</td>\n",
       "      <td>18.005536</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>n_obs_used</td>\n",
       "      <td>14.650509</td>\n",
       "      <td>18.095537</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>3</td>\n",
       "      <td>i</td>\n",
       "      <td>5.876609</td>\n",
       "      <td>10.6704</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>om</td>\n",
       "      <td>2.823642</td>\n",
       "      <td>84.040274</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0</td>\n",
       "      <td>condition_code</td>\n",
       "      <td>1.396686</td>\n",
       "      <td>0.18725</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0</td>\n",
       "      <td>w</td>\n",
       "      <td>0.829923</td>\n",
       "      <td>205.94687</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0</td>\n",
       "      <td>n_obs_used</td>\n",
       "      <td>0.577087</td>\n",
       "      <td>199.988755</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0</td>\n",
       "      <td>n</td>\n",
       "      <td>0.460764</td>\n",
       "      <td>0.226685</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>4</td>\n",
       "      <td>w</td>\n",
       "      <td>2.976050</td>\n",
       "      <td>88.600346</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>4</td>\n",
       "      <td>om</td>\n",
       "      <td>2.472264</td>\n",
       "      <td>244.593422</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>4</td>\n",
       "      <td>condition_code</td>\n",
       "      <td>1.299007</td>\n",
       "      <td>0.237537</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>4</td>\n",
       "      <td>n_obs_used</td>\n",
       "      <td>0.579315</td>\n",
       "      <td>198.257712</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>4</td>\n",
       "      <td>ma</td>\n",
       "      <td>0.565179</td>\n",
       "      <td>162.908708</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>2</td>\n",
       "      <td>n_obs_used</td>\n",
       "      <td>16.539234</td>\n",
       "      <td>1149.485048</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>2</td>\n",
       "      <td>data_arc</td>\n",
       "      <td>11.543415</td>\n",
       "      <td>12629.802255</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>2</td>\n",
       "      <td>h</td>\n",
       "      <td>10.733020</td>\n",
       "      <td>13.951356</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>2</td>\n",
       "      <td>q</td>\n",
       "      <td>4.311334</td>\n",
       "      <td>3.780162</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>2</td>\n",
       "      <td>moid</td>\n",
       "      <td>4.298770</td>\n",
       "      <td>2.791547</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>1</td>\n",
       "      <td>w</td>\n",
       "      <td>3.681780</td>\n",
       "      <td>277.27254</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>1</td>\n",
       "      <td>om</td>\n",
       "      <td>3.497154</td>\n",
       "      <td>259.451502</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>1</td>\n",
       "      <td>condition_code</td>\n",
       "      <td>1.538064</td>\n",
       "      <td>0.236619</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>1</td>\n",
       "      <td>e</td>\n",
       "      <td>1.335589</td>\n",
       "      <td>0.124196</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>1</td>\n",
       "      <td>i</td>\n",
       "      <td>0.946616</td>\n",
       "      <td>7.349716</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>5</td>\n",
       "      <td>e</td>\n",
       "      <td>51.163396</td>\n",
       "      <td>0.594384</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>5</td>\n",
       "      <td>h</td>\n",
       "      <td>29.822758</td>\n",
       "      <td>21.763982</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>5</td>\n",
       "      <td>condition_code</td>\n",
       "      <td>16.744631</td>\n",
       "      <td>5.188867</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>5</td>\n",
       "      <td>data_arc</td>\n",
       "      <td>11.857491</td>\n",
       "      <td>1152.310966</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>5</td>\n",
       "      <td>neo</td>\n",
       "      <td>9.156986</td>\n",
       "      <td>Y</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>7</td>\n",
       "      <td>h</td>\n",
       "      <td>39.831296</td>\n",
       "      <td>22.939038</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>7</td>\n",
       "      <td>e</td>\n",
       "      <td>38.029636</td>\n",
       "      <td>0.457377</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>7</td>\n",
       "      <td>n</td>\n",
       "      <td>30.939406</td>\n",
       "      <td>0.455755</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>7</td>\n",
       "      <td>condition_code</td>\n",
       "      <td>15.224177</td>\n",
       "      <td>4.561035</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>7</td>\n",
       "      <td>neo</td>\n",
       "      <td>11.453909</td>\n",
       "      <td>Y</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>9</td>\n",
       "      <td>i</td>\n",
       "      <td>32.003468</td>\n",
       "      <td>34.471053</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>9</td>\n",
       "      <td>n</td>\n",
       "      <td>26.911428</td>\n",
       "      <td>0.508083</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>9</td>\n",
       "      <td>e</td>\n",
       "      <td>18.966981</td>\n",
       "      <td>0.370235</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>9</td>\n",
       "      <td>h</td>\n",
       "      <td>17.865198</td>\n",
       "      <td>20.717401</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>9</td>\n",
       "      <td>condition_code</td>\n",
       "      <td>9.099745</td>\n",
       "      <td>4.069767</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    prototype_id         feature  contribution_score interpreted_value  rank\n",
       "0              3  condition_code           47.354488          6.712051     1\n",
       "1              3        data_arc           29.003140         253.67423     2\n",
       "2              3               h           15.000282         18.005536     3\n",
       "3              3      n_obs_used           14.650509         18.095537     4\n",
       "4              3               i            5.876609           10.6704     5\n",
       "5              0              om            2.823642         84.040274     1\n",
       "6              0  condition_code            1.396686           0.18725     2\n",
       "7              0               w            0.829923         205.94687     3\n",
       "8              0      n_obs_used            0.577087        199.988755     4\n",
       "9              0               n            0.460764          0.226685     5\n",
       "10             4               w            2.976050         88.600346     1\n",
       "11             4              om            2.472264        244.593422     2\n",
       "12             4  condition_code            1.299007          0.237537     3\n",
       "13             4      n_obs_used            0.579315        198.257712     4\n",
       "14             4              ma            0.565179        162.908708     5\n",
       "15             2      n_obs_used           16.539234       1149.485048     1\n",
       "16             2        data_arc           11.543415      12629.802255     2\n",
       "17             2               h           10.733020         13.951356     3\n",
       "18             2               q            4.311334          3.780162     4\n",
       "19             2            moid            4.298770          2.791547     5\n",
       "20             1               w            3.681780         277.27254     1\n",
       "21             1              om            3.497154        259.451502     2\n",
       "22             1  condition_code            1.538064          0.236619     3\n",
       "23             1               e            1.335589          0.124196     4\n",
       "24             1               i            0.946616          7.349716     5\n",
       "25             5               e           51.163396          0.594384     1\n",
       "26             5               h           29.822758         21.763982     2\n",
       "27             5  condition_code           16.744631          5.188867     3\n",
       "28             5        data_arc           11.857491       1152.310966     4\n",
       "29             5             neo            9.156986                 Y     5\n",
       "30             7               h           39.831296         22.939038     1\n",
       "31             7               e           38.029636          0.457377     2\n",
       "32             7               n           30.939406          0.455755     3\n",
       "33             7  condition_code           15.224177          4.561035     4\n",
       "34             7             neo           11.453909                 Y     5\n",
       "35             9               i           32.003468         34.471053     1\n",
       "36             9               n           26.911428          0.508083     2\n",
       "37             9               e           18.966981          0.370235     3\n",
       "38             9               h           17.865198         20.717401     4\n",
       "39             9  condition_code            9.099745          4.069767     5"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Матрица ошибок на test:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>48845</td>\n",
       "      <td>1033</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>121</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       0     1\n",
       "0  48845  1033\n",
       "1      1   121"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Первые 10 фактических и предсказанных значений на test:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>object_name</th>\n",
       "      <th>y_true</th>\n",
       "      <th>y_pred</th>\n",
       "      <th>y_pred_proba</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>nan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000041</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>nan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000116</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>nan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000245</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>nan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000022</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>nan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000150</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>nan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000084</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>nan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000121</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Dinotina</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.001224</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>nan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000040</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>nan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000272</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  object_name  y_true  y_pred  y_pred_proba\n",
       "0         nan       0       0      0.000041\n",
       "1         nan       0       0      0.000116\n",
       "2         nan       0       0      0.000245\n",
       "3         nan       0       0      0.000022\n",
       "4         nan       0       0      0.000150\n",
       "5         nan       0       0      0.000084\n",
       "6         nan       0       0      0.000121\n",
       "7    Dinotina       0       0      0.001224\n",
       "8         nan       0       0      0.000040\n",
       "9         nan       0       0      0.000272"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# === КЛЮЧЕВАЯ ИНФОРМАЦИЯ ДЛЯ ВЫВОДА ===\n",
    "print(\"=== КЛЮЧЕВАЯ ИНФОРМАЦИЯ ДЛЯ ВЫВОДА ===\")\n",
    "print(f\"Зависимая переменная: {target_col}\")\n",
    "print()\n",
    "print(f\"Число исходных признаков: {X.shape[1]}\")\n",
    "print(f\"Число признаков после кодирования: {X_train_p.shape[1]}\")\n",
    "print(f\"Лучшая конфигурация прототипной сети:\")\n",
    "print(f\"n_prototypes_per_class = {int(best_params['n_prototypes_per_class'])}\")\n",
    "print(f\"total_prototypes = {int(prototype_df.shape[0])}\")\n",
    "print(f\"sigma_multiplier = {float(best_params['sigma_multiplier'])}\")\n",
    "print(f\"sigma = {float(final_model.sigma_)}\")\n",
    "print(f\"C = {float(best_params['C'])}\")\n",
    "print()\n",
    "\n",
    "print(\"Распределение классов:\")\n",
    "display(y.value_counts(normalize=True).sort_index().rename(\"share\").to_frame())\n",
    "\n",
    "print(\"Использованные исходные признаки:\")\n",
    "display(pd.DataFrame({\"feature\": X.columns.tolist()}))\n",
    "\n",
    "print(\"Метрики модели:\")\n",
    "display(results_df)\n",
    "\n",
    "print(\"Топ-10 конфигураций по validation:\")\n",
    "display(search_df.head(10))\n",
    "\n",
    "print(\"Топ-10 признаков по важности:\")\n",
    "display(perm_df.head(10))\n",
    "\n",
    "print(\"Основные прототипы:\")\n",
    "display(prototype_df.head(10))\n",
    "\n",
    "print(\"Топ-5 вкладов признаков для основных прототипов:\")\n",
    "display(prototype_contrib_df)\n",
    "\n",
    "print(\"Матрица ошибок на test:\")\n",
    "display(pd.DataFrame(cm))\n",
    "\n",
    "print(\"Первые 10 фактических и предсказанных значений на test:\")\n",
    "display(pred_df.head(10))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aaf705be",
   "metadata": {},
   "source": [
    "Итог\n",
    "\n",
    "Зависимой переменной выбрана pha. Прототипная нейросеть с 10 прототипами (по 5 на класс) показала на test accuracy = 0.979, balanced accuracy = 0.986, F1-macro = 0.590, ROC-AUC = 0.996. С учётом сильного дисбаланса классов (0.245% опасных объектов) модель даёт высокое качество именно по сбалансированным метрикам, а не только по accuracy.\n",
    "\n",
    "Наиболее важные признаки: e, condition_code, data_arc, n, n_obs_used. Для опасных астероидов прототипы характеризуются, как правило, более высокой эксцентриситетностью, повышенными значениями condition_code, меньшей наблюдательной историей (data_arc, n_obs_used) и признаками класса neo = Y. Для неопасных объектов характерны более стабильные орбитальные параметры и значительно большая представленность в обучающей выборке.\n",
    "\n",
    "Следовательно, прототипная сеть выделяет интерпретируемые группы астероидов, а ключевой вклад в распознавание опасных объектов вносят орбитальная эксцентриситетность и характеристики наблюдаемости орбиты."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
