Introduction
Dataset Description
Data Preparation and Exploratory Analysis
Installing Oracle AI Database 26ai with Docker
Creating the Application User
Creating the Database Schema
Loading the Dataset into Oracle Database
Python Analytics Pipeline
Writing Analytical Results Back to Oracle
Connecting Oracle APEX to the Database
Oracle APEX Analytical Dashboards
Advantages of the Architecture

Introduction

Artificial intelligence (AI) is increasingly transforming healthcare by enabling the analysis of complex biomedical datasets and supporting advances in precision medicine. One important application is the classification of cancer types using gene expression data.

Gene expression datasets measure the activity of thousands of genes simultaneously, allowing researchers to identify molecular patterns associated with specific tumour types. These datasets are typically extremely high dimensional and require advanced machine learning techniques to extract meaningful insights.

The dataset used in this project contains approximately 802 tumour samples and more than 20,000 gene expression features, representing five cancer types:

BRCA – Breast invasive carcinoma
KIRC – Kidney renal clear cell carcinoma
COAD – Colon adenocarcinoma
LUAD – Lung adenocarcinoma
PRAD – Prostate adenocarcinoma

In the initial implementation, data analysis was performed using Python libraries including NumPy, Pandas, SciPy, Matplotlib, Seaborn, and Scikit-learn (Pedregosa et al., 2011) within a Jupyter Notebook environment.

Version 2 of this project extends the analytical workflow by integrating:

Oracle AI Database 26ai
Oracle Application Express (APEX)
Python machine learning pipelines
Containerised deployment using Docker

This architecture demonstrates how machine learning analytics can evolve into a scalable biomedical data platform.

Installing Oracle AI Database 26ai Locally with Docker

Before loading the dataset or creating database tables, the Oracle database environment must be installed.

To run Oracle locally, I used the Oracle AI Database Free Docker container provided through the Oracle Container Registry to ensure reproducibility (Merkel, 2014).

The container contains a complete Oracle database installation bundled with Oracle Linux and is designed for development and learning environments.

The image is relatively large because it includes the full database stack and operating system components. In practice, the download size is typically around 9–9.5 GB.

The container is distributed as a multi-architecture image, meaning it can run on both x86_64 systems and ARM64 platforms such as Apple Silicon MacBooks.

Verifying Docker Installation

Before running Oracle containers, I verified that Docker was installed correctly.

 % docker --version
Docker version 29.2.1, build a5c7197

I also tested Docker by running a simple container:

 % docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
xxxxxxxx: Pull complete 
Digest: xxxxxxxxxxxxx
Status: Downloaded newer image for hello-world:latest

Docker successfully pulled the hello-world image and executed the test container, confirming that the environment was configured correctly.

Downloading the Oracle Database Container

Next, I downloaded the Oracle AI Database container image:

% docker pull container-registry.oracle.com/database/free:latest
latest: Pulling from database/free
3a68a0fbce6c: Pull complete 
e43f6e518290: Pull complete 
8a49200a20c1: Downloading [==========>                                        ]  210.9MB/975.9MB
846136329409: Downloading [=>                                                 ]  36.77MB/1.621GB
00bb9bec3644: Download complete 
c2d0491c1211: Download complete 
97830dd37452: Download complete 
906a752075da: Download complete 
d107ccb8dac8: Download complete 
732076e3fb39: Download complete 
c42c4cff24fa: Download complete 
322b4d8e3b1b: Download complete 
6f1f0629e77e: Download complete 
ff1115604a7b: Download complete 
9a21c00d6ab7: Downloading [============>                                      ]  132.5MB/511.7MB
32afc186869b: Waiting 
d3e77c9f993e: Waiting 
cf3af96c84f8: Waiting 
ce78d0df9a03: Waiting 
bc33b1d1323b: Waiting 
ce773ccc2f4f: Waiting 
66735cc652fb: Waiting
….
After a while...

% docker pull container-registry.oracle.com/database/free:latest
latest: Pulling from database/free
3a68a0fbce6c: Pull complete 
e43f6e518290: Pull complete 
8a49200a20c1: Pull complete 
846136329409: Pull complete 
00bb9bec3644: Pull complete 
c2d0491c1211: Pull complete 
97830dd37452: Pull complete 
906a752075da: Pull complete 
d107ccb8dac8: Pull complete 
732076e3fb39: Pull complete 
c42c4cff24fa: Pull complete 
322b4d8e3b1b: Pull complete 
6f1f0629e77e: Pull complete 
ff1115604a7b: Pull complete 
9a21c00d6ab7: Pull complete 
32afc186869b: Pull complete 
d3e77c9f993e: Pull complete 
cf3af96c84f8: Pull complete 
ce78d0df9a03: Pull complete 
bc33b1d1323b: Pull complete 
ce773ccc2f4f: Pull complete 
66735cc652fb: Pull complete 
Digest: sha256:xxxxxxxxxxxxxxxxxxxxxxxxx
Status: Downloaded newer image for container-registry.oracle.com/database/free:latest
container-registry.oracle.com/database/free:latest

Starting the Oracle Database Container

After downloading the container image, I launched the Oracle database container:

 % docker run -d --name oracledb \
  -p 1521:1521 \
  -p 5500:5500 \
  container-registry.oracle.com/database/free:latest
xxxxxxxxxxxxxxxxxxxxxxx
%

The container exposes two important ports:

Port	Purpose
1521	Oracle database listener
5500	Oracle Enterprise Manager Database Express

The default configuration includes:

SID: FREE
Pluggable database: FREEPDB1

Checking the Container Image

To verify the downloaded container image and its size, I ran:

% docker images
                                                                                                    i Info →   U  In Use
IMAGE                                                ID             DISK USAGE   CONTENT SIZE   EXTRA
container-registry.oracle.com/database/free:latest   bxxxxxx       9.69GB             0B    U   
hello-world:latest                                   cxxxxxx        5.2kB             0B    U

Verifying the Container Status

To confirm that the Oracle container is running:

% docker ps
CONTAINER ID   IMAGE                                                COMMAND                  CREATED         STATUS                   PORTS                                                                                      NAMES
exxxxxxx   container-registry.oracle.com/database/free:latest   "/bin/bash -c $ORACL…"   2 minutes ago   Up 2 minutes (healthy)   0.0.0.0:1521->1521/tcp, [::]:1521->1521/tcp, 0.0.0.0:5500->5500/tcp, [::]:5500->5500/tcp   oracledb

The container shows a healthy status, indicating that the internal Oracle startup scripts completed successfully.

Inspecting the Container Configuration

To view detailed configuration information about the container:

brunoreisdasilva@Brunos-MacBook-Pro cancer-gene-expression-classification-clean % docker inspect oracledb
[
    {
        "Id": "e2136xxxxxxxxxxxxxxxxxxxxxxx",
        "Created": "2026-03-15T19:21:09.773725255Z",
        "Path": "/bin/bash",
        "Args": [
            "-c",
            "$ORACLE_BASE/$RUN_FILE"
        ],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,

….
                    "Gateway": "172.xx.0.1",
                    "IPAddress": "172.xx.0.2",
                    "MacAddress": "06:xx:6e:xx:02:0e",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "DNSNames": null
                }
            }
        }

The output shows the detailed configuration and runtime state of my Oracle database container. The container has the ID e21xxxxxx and is currently running and healthy, indicating that the internal health checks are passing and the database services started successfully. It was created and started on March 15, 2026, and is executing a startup script that launches the Oracle database environment inside the container. The system is running on a Linux platform using the overlay2 storage driver, which manages the container’s layered filesystem. The configuration exposes the standard Oracle ports 1521 for database connections and 5500 for the web-based management interface. Internally, the container runs under the oracle user and sets environment variables such as ORACLE_BASE=/opt/oracle and ORACLE_HOME=/opt/oracle/product/26ai/dbhomeFree, indicating that the image contains Oracle AI Database 26ai Free. The container is connected to Docker’s default bridge network, where it received the internal IP address 172.xx.0.2, allowing it to communicate with other containers while also mapping its ports to the host machine. The metadata labels confirm that the image is maintained by Oracle and distributed as the Oracle AI Database 26ai Free edition, which is intended for development and learning purposes.

Checking Docker Disk Usage

To review the container storage usage:

% docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          2         2         9.693GB   9.693GB (100%)
Containers      2         1         4.269GB   0B (0%)
Local Volumes   0         0         0B        0B
Build Cache     0         0         0B        0B

Accessing the Container

Once the container is running, I log into it to interact directly with the operating system inside the container. To do this, I used the command docker exec -it oracledb bash, which opens an interactive terminal session within the container. After executing the command, the prompt changes to something like oracle@7f9b2c1d4a2f:/$. This indicates that I am now inside the container as the oracle user, working within the Oracle Linux environment where the Oracle database software is installed and running. From this point, I can execute system commands, check database processes, or interact with the Oracle environment directly.

 % docker exec -it oracledb bash
bash-4.4$ 
bash-4.4$ 
bash-4.4$ hostname
exxxxxx
bash-4.4$ whoami
oracle

After logging into the container, I verify that the Oracle database instance is running by checking the background processes associated with the database. To do this, I run the command ps -ef | grep pmon, which searches for the PMON (Process Monitor) process. The output shows a process named db_pmon_FREE running under the oracle user, which confirms that the Oracle database instance with the SID FREE is active and operating correctly. The PMON process is an essential background component of Oracle that monitors other processes and performs recovery tasks if necessary. Seeing this process in the output indicates that the database has successfully started inside the container.

bash-4.4$ ps -ef | grep pmon
oracle      36     1  0 19:21 ?        00:00:00 db_pmon_FREE
oracle     838   776  0 19:28 pts/0    00:00:00 grep pmon
bash-4.4$

After confirming that the database processes are running, I proceed to connect to the Oracle database from inside the container. First, I set the environment variable for the database instance by running export ORACLE_SID=FREE, which tells the Oracle tools which database instance to target. Then I start the SQLPlus client using sqlplus "/as sysdba", which connects with SYSDBA administrative privileges. SQLPlus starts and displays the version information, indicating SQL*Plus Release 23.26.1.0.0, which corresponds to the Oracle AI Database Free environment packaged in the container. After the client initializes, it prompts for a password to complete the authentication process, allowing administrative access to the database once the correct credentials are provided.

bash-4.4$ sqlplus "/as sysdba"

SQL*Plus: Release 23.26.1.0.0 - Production on Sun Mar 15 19:30:21 2026
Version 23.26.1.0.0

Copyright (c) 1982, 2025, Oracle.  All rights reserved.

Enter password: 

Connected to:
Oracle AI Database 26ai Free Release 23.26.1.0.0 - Develop, Learn, and Run for Free
Version 23.26.1.0.0

SQL>

After successfully connecting to the Oracle database using SQL*Plus, I verify that the database instance is running properly by checking its status from within the SQL prompt. Once connected, the prompt changes to SQL>, indicating that I am now interacting directly with the Oracle database engine. To confirm that the instance is open and operating correctly, I run the command SELECT status FROM v$instance;. This query checks the current state of the database instance. If everything is functioning as expected, the output typically returns OPEN, which means the database is fully started and available for connections and queries. This step is useful to ensure that the Oracle AI Database Free instance inside the container is running correctly before performing administrative tasks or executing SQL commands.

SQL> select instance_name, status FROM v$instance;  

INSTANCE_NAME	 STATUS
---------------- ------------
FREE		 OPEN

Creating an Application User

After confirming that the Oracle AI Database instance is running, the next step is to create a dedicated application user that will own the tables used in the cancer gene expression analysis.

Using a separate schema improves security and keeps application objects isolated from system accounts.

First, connect to the database as SYSDBA and switch to the Pluggable Database

Inside SQLPlus:

SQL> ALTER SESSION SET CONTAINER=FREEPDB1;

Session altered.

SQL> SHOW CON_NAME;

CON_NAME
------------------------------
FREEPDB1
SQL>

Create the Application User

Now create the user normally (no C## needed):

SQL> 
SQL> CREATE USER ai_healthcare
IDENTIFIED BY xxxxxx
DEFAULT TABLESPACE users
TEMPORARY TABLESPACE temp
QUOTA UNLIMITED ON users;  

User created.

Grant Required Privileges


SQL> 
SQL> GRANT CREATE SESSION TO ai_healthcare;
GRANT CREATE TABLE TO ai_healthcare;
GRANT CREATE VIEW TO ai_healthcare;
GRANT CREATE SEQUENCE TO ai_healthcare;
GRANT CREATE PROCEDURE TO ai_healthcare;
Grant succeeded.

SQL> 
Grant succeeded.

SQL> 
Grant succeeded.

SQL> 
Grant succeeded.

SQL> 

Grant succeeded.

Connect as the Application User

SQL> 
SQL> CONNECT ai_healthcare/xxxxx@FREEPDB1
Connected.
SQL>

Creating the Database Schema

Once the Oracle database is running, the next step is to create the tables used to store the gene expression dataset.

CREATE TABLE patient_labels (
    sample_id VARCHAR2(50) PRIMARY KEY,
    cancer_type VARCHAR2(10)
);

CREATE TABLE gene_expression (
    sample_id VARCHAR2(50),
    gene_id VARCHAR2(50),
    expression_value NUMBER,
    FOREIGN KEY (sample_id)
    REFERENCES patient_labels(sample_id)
);

Dataset Preparation

The dataset consists of two files:

data.csv — gene expression measurements
labels.csv — tumour classification labels

These files are merged to associate gene expression values with tumour types.

Machine Learning Analysis

After loading the dataset, the analysis pipeline performs several steps:

feature standardisation using StandardScaler
dimensionality reduction using PCA
visualisation using t-SNE
supervised reduction using LDA
clustering using K-means
statistical testing using ANOVA

These techniques reveal patterns within the genomic dataset and help identify relationships between tumour samples.

Loading the Dataset into Oracle Database

Once the application user and schema have been created, the next step is to load the dataset files into Oracle.

The project uses two source files:

labels.csv, containing the tumour classification for each sample
data.csv, containing gene expression measurements for each tumour sample

Because data.csv is a wide file with thousands of gene columns, the ingestion process is conceptually divided into two stages:

load the labels file into patient_labels
transform the gene expression matrix into a relational structure and load it into gene_expression

In this prototype, Python is used as the data loading and transformation layer.

Connecting Python to Oracle

First starting the Jupyter notebook from the terminal:

 % python3 -m notebook
...
[I 2026-03-16 09:23:45.955 ServerApp] Extension package jupyter_lsp took 0.1259s to import
[I 2026-03-16 09:23:46.256 ServerApp] jupyter_lsp | extension was successfully linked.
[I 2026-03-16 09:23:46.260 ServerApp] jupyter_server_terminals | extension was successfully linked.
[I 2026-03-16 09:23:46.264 ServerApp] jupyterlab | extension was successfully linked.
...
[I 2026-03-16 09:23:48.013 ServerApp] Skipped non-installed server(s): basedpyright, bash-language-server, dockerfile-language-server-nodejs, javascript-typescript-langserver, jedi-language-server, julia-language-server, pyrefly, pyright, python-language-server, python-lsp-server, r-languageserver, sql-language-server, texlab, typescript-language-server, unified-language-server, vscode-css-languageserver-bin, vscode-html-languageserver-bin, vscode-json-languageserver-bin, yaml-language-server

import pandas as pd
import oracledb

connection = oracledb.connect(
    user="ai_healthcare",
    password="xxxxxxxx",
    dsn="localhost:1521/FREEPDB1"
)

The Python pipeline connects to Oracle using the oracledb library.

This connection allows Python to read the CSV files, transform them, and insert the records into the Oracle database.

Loading the Labels File

The tumour labels are loaded first.

labels_df = pd.read_csv("labels.csv")
labels_df.columns = [c.lower() for c in labels_df.columns]
print(labels_df.head())

Assuming the file contains the sample identifier and tumour class, the rows can be inserted into the patient_labels table.

cursor = connection.cursor()

insert_labels_sql = """
    INSERT INTO patient_labels (sample_id, cancer_type)
    VALUES (:1, :2)
"""

label_rows = [
    (row["unnamed: 0"], row["class"])
    for _, row in labels_df.iterrows()
]

cursor.executemany(insert_labels_sql, label_rows)
connection.commit()

This step stores the tumour labels in Oracle so that each sample can later be linked to its corresponding gene expression values.

Loading the Gene Expression Data

The data.csv file contains one row per tumour sample and thousands of gene columns. To store this in a normalized Oracle schema, the dataset is reshaped from a wide format into a long format.

data_df = pd.read_csv("data.csv")
data_df.columns = [c.lower() for c in data_df.columns]
print(data_df.shape)

A typical row in data.csv contains:

the sample identifier
one column for each gene expression value

To load it into the gene_expression table, the dataset is unpivoted.

id_col = "Unnamed: 0".lower()

long_df = data_df.melt(
    id_vars=[id_col],
    var_name="gene_id",
    value_name="expression_value"
)

long_df.rename(columns={id_col: "sample_id"}, inplace=True)
print(long_df.head())

After transformation, each row represents a single gene expression measurement for one sample.

Example structure:

sample_id
gene_id
expression_value

These rows can then be inserted into Oracle.

insert_expr_sql = """
    INSERT INTO gene_expression (sample_id, gene_id, expression_value)
    VALUES (:1, :2, :3)
"""

expr_rows = list(long_df.itertuples(index=False, name=None))

cursor.executemany(insert_expr_sql, expr_rows)
connection.commit()

At this point, the dataset has been embedded into the Oracle database in a form that supports SQL queries, analytics, and downstream machine learning workflows.

Retrieving the Dataset from Oracle for Analysis

Once the data is stored in Oracle, Python can query it directly for preprocessing and analysis.

query = """
SELECT g.sample_id,
       p.cancer_type,
       g.gene_id,
       g.expression_value
FROM gene_expression g
JOIN patient_labels p
  ON g.sample_id = p.sample_id
"""

df_long = pd.read_sql(query, connection)
print(df_long.head())

Because most machine learning algorithms expect a matrix format, the long dataset is pivoted back into a wide analytical matrix.

df_matrix = df_long.pivot(
    index="SAMPLE_ID",
    columns="GENE_ID",
    values="EXPRESSION_VALUE"
)

labels_from_db = pd.read_sql(
    "SELECT sample_id, cancer_type FROM patient_labels",
    connection
)

df_matrix = df_matrix.merge(
    labels_from_db,
    left_index=True,
    right_on="SAMPLE_ID"
)

This creates a dataset that can be used for feature scaling, dimensionality reduction, and clustering.

Python Analytics Pipeline

After loading the data from Oracle, the machine learning workflow proceeds in Python.

Standardisation

Gene expression values vary considerably across genes, so the features are standardised before analysis.

from sklearn.preprocessing import StandardScaler

X = df_matrix.drop(columns=["SAMPLE_ID", "CANCER_TYPE"])
y = df_matrix["CANCER_TYPE"]

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Principal Component Analysis (PCA)

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

pca_results = pd.DataFrame(X_pca, columns=["pca1", "pca2"])
pca_results["sample_id"] = df_matrix["SAMPLE_ID"].values
pca_results["CANCER_TYPE"] = y.values

t-SNE

from sklearn.manifold import TSNE

tsne = TSNE(n_components=2, learning_rate=50, random_state=42)
X_tsne = tsne.fit_transform(X_scaled)

tsne_results = pd.DataFrame(X_tsne, columns=["tsne_x", "tsne_y"])
tsne_results["SAMPLE_ID"] = df_matrix["SAMPLE_ID"].values
tsne_results["CANCER_TYPE"] = y.values

K-Means Clustering

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=5, n_init=10, random_state=42)
clusters = kmeans.fit_predict(X_pca)

cluster_results = pd.DataFrame({
    "sample_id": df_matrix["SAMPLE_ID"].values,
    "cancer_type": y.values,
    "cluster_id": clusters
})

Writing the Analytical Results Back to Oracle

To support Oracle APEX dashboards, the analytical results are stored back in the database (under the ai_healthcare user).

PCA results table

CREATE TABLE pca_results (
    sample_id   VARCHAR2(50) PRIMARY KEY,
    cancer_type VARCHAR2(10),
    pca1        NUMBER,
    pca2        NUMBER
);

t-SNE results table

CREATE TABLE tsne_results (
    sample_id   VARCHAR2(50) PRIMARY KEY,
    cancer_type VARCHAR2(10),
    tsne_x      NUMBER,
    tsne_y      NUMBER
);

Cluster results table

CREATE TABLE cluster_results (
    sample_id   VARCHAR2(50) PRIMARY KEY,
    cancer_type VARCHAR2(10),
    cluster_id  NUMBER
);

The Python results can then be inserted into Oracle.

# Clear tables first
cursor.execute("TRUNCATE TABLE pca_results")
cursor.execute("TRUNCATE TABLE tsne_results")
cursor.execute("TRUNCATE TABLE cluster_results")

cursor.executemany(
    "INSERT INTO pca_results (sample_id, cancer_type, pca1, pca2) VALUES (:1, :2, :3, :4)",
    list(pca_results[["sample_id", "CANCER_TYPE", "pca1", "pca2"]].itertuples(index=False, name=None))
)

cursor.executemany(
    "INSERT INTO tsne_results (sample_id, CANCER_TYPE, tsne_x, tsne_y) VALUES (:1, :2, :3, :4)",
    list(tsne_results[["SAMPLE_ID", "CANCER_TYPE", "tsne_x", "tsne_y"]].itertuples(index=False, name=None))
)

cursor.executemany(
    "INSERT INTO cluster_results (sample_id, cancer_type, cluster_id) VALUES (:1, :2, :3)",
    list(cluster_results[["sample_id", "cancer_type", "cluster_id"]].itertuples(index=False, name=None))
)

connection.commit()

This step is what makes the Oracle APEX dashboards possible: the charts are not reading directly from the notebook, but from result tables persisted in Oracle.

Connecting Oracle APEX through Oracle REST Data Services (ORDS)

In earlier Oracle container deployments, database administration tools were accessed through Enterprise Manager Database Express using the endpoint:

http://localhost:5500/em

However, in Oracle AI Database 26ai, Enterprise Manager Database Express has been desupported Oracle Corporation (2026). As a result, the /em endpoint may not respond even when the database container is running correctly.

Instead, modern Oracle database deployments expose web-based tools through Oracle REST Data Services (ORDS). ORDS acts as the HTTP gateway between web applications and the Oracle database and enables access to Oracle Application Express (APEX) and REST APIs.

After installing ORDS and linking it to the database service FREEPDB1, the service becomes available through a local web server.

Accessing Oracle APEX through ORDS

Once ORDS is running, the APEX environment can be accessed through the browser using the following URL:

http://localhost:8080/ords

From this interface, developers can log in using the APEX administrator account and create workspaces connected to database schemas

However, when APEX is not installed inside the pluggable database, the APEX tile may display “App Unavailable.”

To enable APEX, the Oracle APEX schema must be installed in the FREEPDB1 pluggable database.

Installing Oracle APEX in the Database

First connect to the database container:

docker exec -it oracledb sqlplus sys/XXXXXXXX@localhost:1521/FREEPDB1 as sysdba

Then run:

@apexins.sql SYSAUX SYSAUX TEMP /i/


SYS> SHOW CON_NAME;

CON_NAME
------------------------------
FREEPDB1
SYS> SELECT comp_id, comp_name, version, status
FROM dba_registry
WHERE comp_id = 'APEX';  2    3  

COMP_ID
------------------------------
COMP_NAME
--------------------------------------------------------------------------------
VERSION 		       STATUS
------------------------------ -----------
APEX
Oracle APEX
24.2.0			       VALID

Although Oracle APEX 24.2 was successfully installed and validated in the FREEPDB1 pluggable database, the APEX endpoint initially returned HTTP 404 through ORDS. Oracle documents that when ORDS is installed before APEX, ORDS can create placeholder metadata instead of binding to the real APEX schema. The issue was resolved by configuring the database pool with plsql.gateway.mode=proxied, running ords install repair, and ensuring that standalone.static.path pointed to the downloaded APEX images directory.

Connecting APEX to the AI_HEALTHCARE Schema

Inside the APEX administration interface, a workspace can be created to connect the APEX application environment to the schema used in the machine learning pipeline.

Inside APEX:

Login as ADMIN
Go to Manage Workspaces
Create workspace

Workspace name: AI_HEALTHCARE_WS

Once the workspace is created, log into Oracle APEX using the workspace credentials.

Creating an APEX Application

After logging into the APEX workspace:

Navigate to App Builder
Click Create Application
Provide a name:

Add pages for visualisation, including:
- Interactive reports

Charts
Scatter plots
Cluster visualisations

APEX will automatically use the database schema associated with the workspace.

Creating Dashboard Queries

Each dashboard component is powered by a SQL query executed directly against the Oracle database.

Dataset Overview

This report shows the number of tumour samples per cancer type.

SELECT cancer_type, COUNT(*) AS sample_count
FROM patient_labels
GROUP BY cancer_type
ORDER BY sample_count DESC;

In APEX, configure the chart as:

label: CANCER_TYPE
VALUE: SAMPLE_COUNT

PCA Visualization

Principal Component Analysis (PCA) is used to reduce the dimensionality of the gene expression dataset while preserving as much variance as possible (Jolliffe, 2002).

This scatter plot visualises the first two principal components computed in the Python pipeline.

SELECT pca1,
       pca2,
       cancer_type
FROM pca_results;

In APEX, configure the chart as:

Chart type: Scatter Plot
X-axis: PCA1
Y-axis: PCA2
Group by: CANCER_TYPE

Each color/shape represents a different cancer type, and the axes represent the first two principal components from the dimensionality reduction pipeline.

The scatter plot displays tumor samples projected onto the first two principal components derived from the gene expression dataset. Each point represents a sample, colored according to its cancer type. The PCA transformation reveals clustering patterns among tumor classes. In particular, kidney renal clear cell carcinoma (KIRC) forms a well-separated cluster, indicating strong differentiation in the reduced feature space, while other cancer types exhibit partial overlap.

t-SNE Cluster Visualization

To better visualize local structure among tumour samples, I applied the t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm (van der Maaten & Hinton, 2008).

This chart visualises tumour clusters using t-SNE coordinates.

SELECT
    tsne_x,
    tsne_y,
    cancer_type
FROM tsne_results

Benefits of Using Oracle APEX for Biomedical Analytics

Integrating APEX into the architecture provides several advantages:

Interactive data exploration

Researchers can explore tumour clusters visually without running Python code.

Real-time database queries

APEX dashboards always reflect the latest results stored in Oracle.

Rapid development

Low-code tools allow dashboards to be created quickly using SQL queries.

Enterprise integration

APEX applications can be integrated with hospital systems, research databases, and clinical analytics platforms.

Final Architecture Overview

The complete Version 2 architecture consists of:

Python analytics pipeline
- Data preprocessing
- PCA
- t-SNE
Oracle AI Database 26ai
- Stores genomic data
- Stores machine learning results
Oracle APEX dashboards
- Visualisation
- Interactive exploration
- Research reporting

Github: https://github.com/brunorsreis/-cancer-gene-expression-classification-version2

This architecture demonstrates how machine learning workflows can evolve into a scalable platform combining AI analytics, enterprise databases, and interactive data applications.

References

Hastie, T., Tibshirani, R., & Friedman, J. (2009).
The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer.

Jolliffe, I. T. (2025).
Principal component analysis (2nd ed.). Springer.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Duchesnay, E. (2011).
Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

Van der Maaten, L., & Hinton, G. (2008).
Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.

Merkel, D. (2014).
Docker: Lightweight Linux containers for consistent development and deployment. Linux Journal.

Oracle Corporation. (2026).
Oracle AI Database Documentation. Release 19 .Database Upgrade GuideOracle.10 Oracle Database Changes, Desupports, and Deprecations

brunors

*The views expressed here are my own and do not represent those of my employer.*

Hello, I’m Bruno — a dual citizen of Brazil and Sweden. I bring a global perspective shaped by experiences in both South America and Europe, with a strong focus on collaboration and innovation across cultures. I am a Computer Scientist, PhD Candidate in Information and Communication Technologies, focusing on Data Science and Artificial Intelligence, and hold dual Master’s degrees in Data Science and Cybersecurity. With over fifteen years of international experience spanning Brazil, Hungary, and Sweden, I have collaborated with global organizations such as IBM, Playtech, and Oracle, as well as contributed remotely to projects across multiple regions. My professional interests include Databases, Cybersecurity, Cloud Computing, Data Science, Data Engineering, Big Data, Artificial Intelligence, Programming, and Software Engineering, all driven by a deep passion for transforming data into strategic business value.

Conceptualising an AI System for Cancer Type Classification Using Gene Expression Data – Version 2 — Integrating Oracle AI Database 26ai and Oracle APEX

Checking the Container Image

Verifying the Container Status

Inspecting the Container Configuration

Checking Docker Disk Usage

Accessing the Container

Creating an Application User

Create the Application User

Grant Required Privileges

Connect as the Application User

Creating the Database Schema

Dataset Preparation

Machine Learning Analysis

Loading the Dataset into Oracle Database

Connecting Python to Oracle

Loading the Labels File

Loading the Gene Expression Data

Retrieving the Dataset from Oracle for Analysis

Python Analytics Pipeline

Standardisation

Principal Component Analysis (PCA)

t-SNE

K-Means Clustering

Writing the Analytical Results Back to Oracle

PCA results table

t-SNE results table

Cluster results table

Connecting Oracle APEX through Oracle REST Data Services (ORDS)

Accessing Oracle APEX through ORDS

Installing Oracle APEX in the Database

Connecting APEX to the AI_HEALTHCARE Schema

Creating an APEX Application

Creating Dashboard Queries

Dataset Overview

PCA Visualization

t-SNE Cluster Visualization

Benefits of Using Oracle APEX for Biomedical Analytics

Final Architecture Overview

Related posts

Checking the Container Image

Verifying the Container Status

Inspecting the Container Configuration

Checking Docker Disk Usage

Accessing the Container

Creating an Application User

Create the Application User

Grant Required Privileges

Connect as the Application User

Creating the Database Schema

Dataset Preparation

Machine Learning Analysis

Loading the Dataset into Oracle Database

Connecting Python to Oracle

Loading the Labels File

Loading the Gene Expression Data

Retrieving the Dataset from Oracle for Analysis

Python Analytics Pipeline

Standardisation

Principal Component Analysis (PCA)

t-SNE

K-Means Clustering

Writing the Analytical Results Back to Oracle

PCA results table

t-SNE results table

Cluster results table

Connecting Oracle APEX through Oracle REST Data Services (ORDS)

Accessing Oracle APEX through ORDS

Installing Oracle APEX in the Database

Connecting APEX to the AI_HEALTHCARE Schema

Creating an APEX Application

Creating Dashboard Queries

Dataset Overview

PCA Visualization

t-SNE Cluster Visualization

Benefits of Using Oracle APEX for Biomedical Analytics

Final Architecture Overview

Related posts

Conceptualising an AI System for Cancer Type Classification Using Gene Expression Data – Version 3 – Integrating Oracle AI Database 26ai, Oracle APEX, and a Claude-Powered Chat Interface

Conceptualising an AI System for Cancer Type Classification Using Gene Expression Data

Building a Semantic Search API with MySQL Vector Search, Oracle Cloud, and an NBA Kaggle Dataset