- Introduction
- Dataset Description
- Data Preparation and Exploratory Analysis
- Installing Oracle AI Database 26ai with Docker
- Creating the Application User
- Creating the Database Schema
- Loading the Dataset into Oracle Database
- Python Analytics Pipeline
- Writing Analytical Results Back to Oracle
- Connecting Oracle APEX to the Database
- Oracle APEX Analytical Dashboards
- Advantages of the Architecture
Introduction
Artificial intelligence (AI) is increasingly transforming healthcare by enabling the analysis of complex biomedical datasets and supporting advances in precision medicine. One important application is the classification of cancer types using gene expression data.
Gene expression datasets measure the activity of thousands of genes simultaneously, allowing researchers to identify molecular patterns associated with specific tumour types. These datasets are typically extremely high dimensional and require advanced machine learning techniques to extract meaningful insights.
The dataset used in this project contains approximately 802 tumour samples and more than 20,000 gene expression features, representing five cancer types:
- BRCA – Breast invasive carcinoma
- KIRC – Kidney renal clear cell carcinoma
- COAD – Colon adenocarcinoma
- LUAD – Lung adenocarcinoma
- PRAD – Prostate adenocarcinoma
In the initial implementation, data analysis was performed using Python libraries including NumPy, Pandas, SciPy, Matplotlib, Seaborn, and Scikit-learn (Pedregosa et al., 2011) within a Jupyter Notebook environment.
Version 2 of this project extends the analytical workflow by integrating:
- Oracle AI Database 26ai
- Oracle Application Express (APEX)
- Python machine learning pipelines
- Containerised deployment using Docker
This architecture demonstrates how machine learning analytics can evolve into a scalable biomedical data platform.
Installing Oracle AI Database 26ai Locally with Docker
Before loading the dataset or creating database tables, the Oracle database environment must be installed.
To run Oracle locally, I used the Oracle AI Database Free Docker container provided through the Oracle Container Registry to ensure reproducibility (Merkel, 2014).
The container contains a complete Oracle database installation bundled with Oracle Linux and is designed for development and learning environments.
The image is relatively large because it includes the full database stack and operating system components. In practice, the download size is typically around 9–9.5 GB.
The container is distributed as a multi-architecture image, meaning it can run on both x86_64 systems and ARM64 platforms such as Apple Silicon MacBooks.
Verifying Docker Installation
Before running Oracle containers, I verified that Docker was installed correctly.
% docker --version
Docker version 29.2.1, build a5c7197
I also tested Docker by running a simple container:
% docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
xxxxxxxx: Pull complete
Digest: xxxxxxxxxxxxx
Status: Downloaded newer image for hello-world:latest
Docker successfully pulled the hello-world image and executed the test container, confirming that the environment was configured correctly.
Downloading the Oracle Database Container
Next, I downloaded the Oracle AI Database container image:
% docker pull container-registry.oracle.com/database/free:latest
latest: Pulling from database/free
3a68a0fbce6c: Pull complete
e43f6e518290: Pull complete
8a49200a20c1: Downloading [==========> ] 210.9MB/975.9MB
846136329409: Downloading [=> ] 36.77MB/1.621GB
00bb9bec3644: Download complete
c2d0491c1211: Download complete
97830dd37452: Download complete
906a752075da: Download complete
d107ccb8dac8: Download complete
732076e3fb39: Download complete
c42c4cff24fa: Download complete
322b4d8e3b1b: Download complete
6f1f0629e77e: Download complete
ff1115604a7b: Download complete
9a21c00d6ab7: Downloading [============> ] 132.5MB/511.7MB
32afc186869b: Waiting
d3e77c9f993e: Waiting
cf3af96c84f8: Waiting
ce78d0df9a03: Waiting
bc33b1d1323b: Waiting
ce773ccc2f4f: Waiting
66735cc652fb: Waiting
….
After a while...
% docker pull container-registry.oracle.com/database/free:latest
latest: Pulling from database/free
3a68a0fbce6c: Pull complete
e43f6e518290: Pull complete
8a49200a20c1: Pull complete
846136329409: Pull complete
00bb9bec3644: Pull complete
c2d0491c1211: Pull complete
97830dd37452: Pull complete
906a752075da: Pull complete
d107ccb8dac8: Pull complete
732076e3fb39: Pull complete
c42c4cff24fa: Pull complete
322b4d8e3b1b: Pull complete
6f1f0629e77e: Pull complete
ff1115604a7b: Pull complete
9a21c00d6ab7: Pull complete
32afc186869b: Pull complete
d3e77c9f993e: Pull complete
cf3af96c84f8: Pull complete
ce78d0df9a03: Pull complete
bc33b1d1323b: Pull complete
ce773ccc2f4f: Pull complete
66735cc652fb: Pull complete
Digest: sha256:xxxxxxxxxxxxxxxxxxxxxxxxx
Status: Downloaded newer image for container-registry.oracle.com/database/free:latest
container-registry.oracle.com/database/free:latest
Starting the Oracle Database Container
After downloading the container image, I launched the Oracle database container:
% docker run -d --name oracledb \
-p 1521:1521 \
-p 5500:5500 \
container-registry.oracle.com/database/free:latest
xxxxxxxxxxxxxxxxxxxxxxx
%
The container exposes two important ports:
| Port | Purpose |
| 1521 | Oracle database listener |
| 5500 | Oracle Enterprise Manager Database Express |
The default configuration includes:
- SID: FREE
- Pluggable database: FREEPDB1
Checking the Container Image
To verify the downloaded container image and its size, I ran:
% docker images
i Info → U In Use
IMAGE ID DISK USAGE CONTENT SIZE EXTRA
container-registry.oracle.com/database/free:latest bxxxxxx 9.69GB 0B U
hello-world:latest cxxxxxx 5.2kB 0B U
Verifying the Container Status
To confirm that the Oracle container is running:
% docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
exxxxxxx container-registry.oracle.com/database/free:latest "/bin/bash -c $ORACL…" 2 minutes ago Up 2 minutes (healthy) 0.0.0.0:1521->1521/tcp, [::]:1521->1521/tcp, 0.0.0.0:5500->5500/tcp, [::]:5500->5500/tcp oracledb
The container shows a healthy status, indicating that the internal Oracle startup scripts completed successfully.
Inspecting the Container Configuration
To view detailed configuration information about the container:
brunoreisdasilva@Brunos-MacBook-Pro cancer-gene-expression-classification-clean % docker inspect oracledb
[
{
"Id": "e2136xxxxxxxxxxxxxxxxxxxxxxx",
"Created": "2026-03-15T19:21:09.773725255Z",
"Path": "/bin/bash",
"Args": [
"-c",
"$ORACLE_BASE/$RUN_FILE"
],
"State": {
"Status": "running",
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
….
"Gateway": "172.xx.0.1",
"IPAddress": "172.xx.0.2",
"MacAddress": "06:xx:6e:xx:02:0e",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"DNSNames": null
}
}
}
The output shows the detailed configuration and runtime state of my Oracle database container. The container has the ID e21xxxxxx and is currently running and healthy, indicating that the internal health checks are passing and the database services started successfully. It was created and started on March 15, 2026, and is executing a startup script that launches the Oracle database environment inside the container. The system is running on a Linux platform using the overlay2 storage driver, which manages the container’s layered filesystem. The configuration exposes the standard Oracle ports 1521 for database connections and 5500 for the web-based management interface. Internally, the container runs under the oracle user and sets environment variables such as ORACLE_BASE=/opt/oracle and ORACLE_HOME=/opt/oracle/product/26ai/dbhomeFree, indicating that the image contains Oracle AI Database 26ai Free. The container is connected to Docker’s default bridge network, where it received the internal IP address 172.xx.0.2, allowing it to communicate with other containers while also mapping its ports to the host machine. The metadata labels confirm that the image is maintained by Oracle and distributed as the Oracle AI Database 26ai Free edition, which is intended for development and learning purposes.
Checking Docker Disk Usage
To review the container storage usage:
% docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 2 2 9.693GB 9.693GB (100%)
Containers 2 1 4.269GB 0B (0%)
Local Volumes 0 0 0B 0B
Build Cache 0 0 0B 0B
Accessing the Container
Once the container is running, I log into it to interact directly with the operating system inside the container. To do this, I used the command docker exec -it oracledb bash, which opens an interactive terminal session within the container. After executing the command, the prompt changes to something like oracle@7f9b2c1d4a2f:/$. This indicates that I am now inside the container as the oracle user, working within the Oracle Linux environment where the Oracle database software is installed and running. From this point, I can execute system commands, check database processes, or interact with the Oracle environment directly.
% docker exec -it oracledb bash
bash-4.4$
bash-4.4$
bash-4.4$ hostname
exxxxxx
bash-4.4$ whoami
oracle
After logging into the container, I verify that the Oracle database instance is running by checking the background processes associated with the database. To do this, I run the command ps -ef | grep pmon, which searches for the PMON (Process Monitor) process. The output shows a process named db_pmon_FREE running under the oracle user, which confirms that the Oracle database instance with the SID FREE is active and operating correctly. The PMON process is an essential background component of Oracle that monitors other processes and performs recovery tasks if necessary. Seeing this process in the output indicates that the database has successfully started inside the container.
bash-4.4$ ps -ef | grep pmon
oracle 36 1 0 19:21 ? 00:00:00 db_pmon_FREE
oracle 838 776 0 19:28 pts/0 00:00:00 grep pmon
bash-4.4$
After confirming that the database processes are running, I proceed to connect to the Oracle database from inside the container. First, I set the environment variable for the database instance by running export ORACLE_SID=FREE, which tells the Oracle tools which database instance to target. Then I start the SQLPlus client using sqlplus "/as sysdba", which connects with SYSDBA administrative privileges. SQLPlus starts and displays the version information, indicating SQL*Plus Release 23.26.1.0.0, which corresponds to the Oracle AI Database Free environment packaged in the container. After the client initializes, it prompts for a password to complete the authentication process, allowing administrative access to the database once the correct credentials are provided.
bash-4.4$ sqlplus "/as sysdba"
SQL*Plus: Release 23.26.1.0.0 - Production on Sun Mar 15 19:30:21 2026
Version 23.26.1.0.0
Copyright (c) 1982, 2025, Oracle. All rights reserved.
Enter password:
Connected to:
Oracle AI Database 26ai Free Release 23.26.1.0.0 - Develop, Learn, and Run for Free
Version 23.26.1.0.0
SQL>
After successfully connecting to the Oracle database using SQL*Plus, I verify that the database instance is running properly by checking its status from within the SQL prompt. Once connected, the prompt changes to SQL>, indicating that I am now interacting directly with the Oracle database engine. To confirm that the instance is open and operating correctly, I run the command SELECT status FROM v$instance;. This query checks the current state of the database instance. If everything is functioning as expected, the output typically returns OPEN, which means the database is fully started and available for connections and queries. This step is useful to ensure that the Oracle AI Database Free instance inside the container is running correctly before performing administrative tasks or executing SQL commands.
SQL> select instance_name, status FROM v$instance;
INSTANCE_NAME STATUS
---------------- ------------
FREE OPEN
Creating an Application User
After confirming that the Oracle AI Database instance is running, the next step is to create a dedicated application user that will own the tables used in the cancer gene expression analysis.
Using a separate schema improves security and keeps application objects isolated from system accounts.
First, connect to the database as SYSDBA and switch to the Pluggable Database
Inside SQLPlus:
SQL> ALTER SESSION SET CONTAINER=FREEPDB1;
Session altered.
SQL> SHOW CON_NAME;
CON_NAME
------------------------------
FREEPDB1
SQL>
Create the Application User
Now create the user normally (no C## needed):
SQL>
SQL> CREATE USER ai_healthcare
IDENTIFIED BY xxxxxx
DEFAULT TABLESPACE users
TEMPORARY TABLESPACE temp
QUOTA UNLIMITED ON users;
User created.
Grant Required Privileges
SQL>
SQL> GRANT CREATE SESSION TO ai_healthcare;
GRANT CREATE TABLE TO ai_healthcare;
GRANT CREATE VIEW TO ai_healthcare;
GRANT CREATE SEQUENCE TO ai_healthcare;
GRANT CREATE PROCEDURE TO ai_healthcare;
Grant succeeded.
SQL>
Grant succeeded.
SQL>
Grant succeeded.
SQL>
Grant succeeded.
SQL>
Grant succeeded.
Connect as the Application User
SQL>
SQL> CONNECT ai_healthcare/xxxxx@FREEPDB1
Connected.
SQL>
Creating the Database Schema
Once the Oracle database is running, the next step is to create the tables used to store the gene expression dataset.
CREATE TABLE patient_labels (
sample_id VARCHAR2(50) PRIMARY KEY,
cancer_type VARCHAR2(10)
);
CREATE TABLE gene_expression (
sample_id VARCHAR2(50),
gene_id VARCHAR2(50),
expression_value NUMBER,
FOREIGN KEY (sample_id)
REFERENCES patient_labels(sample_id)
);
Dataset Preparation
The dataset consists of two files:
- data.csv — gene expression measurements
- labels.csv — tumour classification labels
These files are merged to associate gene expression values with tumour types.
Machine Learning Analysis
After loading the dataset, the analysis pipeline performs several steps:
- feature standardisation using StandardScaler
- dimensionality reduction using PCA
- visualisation using t-SNE
- supervised reduction using LDA
- clustering using K-means
- statistical testing using ANOVA
These techniques reveal patterns within the genomic dataset and help identify relationships between tumour samples.
Loading the Dataset into Oracle Database
Once the application user and schema have been created, the next step is to load the dataset files into Oracle.
The project uses two source files:
labels.csv, containing the tumour classification for each sampledata.csv, containing gene expression measurements for each tumour sample
Because data.csv is a wide file with thousands of gene columns, the ingestion process is conceptually divided into two stages:
- load the labels file into
patient_labels - transform the gene expression matrix into a relational structure and load it into
gene_expression
In this prototype, Python is used as the data loading and transformation layer.
Connecting Python to Oracle
First starting the Jupyter notebook from the terminal:
% python3 -m notebook
...
[I 2026-03-16 09:23:45.955 ServerApp] Extension package jupyter_lsp took 0.1259s to import
[I 2026-03-16 09:23:46.256 ServerApp] jupyter_lsp | extension was successfully linked.
[I 2026-03-16 09:23:46.260 ServerApp] jupyter_server_terminals | extension was successfully linked.
[I 2026-03-16 09:23:46.264 ServerApp] jupyterlab | extension was successfully linked.
...
[I 2026-03-16 09:23:48.013 ServerApp] Skipped non-installed server(s): basedpyright, bash-language-server, dockerfile-language-server-nodejs, javascript-typescript-langserver, jedi-language-server, julia-language-server, pyrefly, pyright, python-language-server, python-lsp-server, r-languageserver, sql-language-server, texlab, typescript-language-server, unified-language-server, vscode-css-languageserver-bin, vscode-html-languageserver-bin, vscode-json-languageserver-bin, yaml-language-server

import pandas as pd
import oracledb
connection = oracledb.connect(
user="ai_healthcare",
password="xxxxxxxx",
dsn="localhost:1521/FREEPDB1"
)
The Python pipeline connects to Oracle using the oracledb library.
This connection allows Python to read the CSV files, transform them, and insert the records into the Oracle database.
Loading the Labels File
The tumour labels are loaded first.
labels_df = pd.read_csv("labels.csv")
labels_df.columns = [c.lower() for c in labels_df.columns]
print(labels_df.head())

Assuming the file contains the sample identifier and tumour class, the rows can be inserted into the patient_labels table.
cursor = connection.cursor()
insert_labels_sql = """
INSERT INTO patient_labels (sample_id, cancer_type)
VALUES (:1, :2)
"""
label_rows = [
(row["unnamed: 0"], row["class"])
for _, row in labels_df.iterrows()
]
cursor.executemany(insert_labels_sql, label_rows)
connection.commit()

This step stores the tumour labels in Oracle so that each sample can later be linked to its corresponding gene expression values.
Loading the Gene Expression Data
The data.csv file contains one row per tumour sample and thousands of gene columns. To store this in a normalized Oracle schema, the dataset is reshaped from a wide format into a long format.
data_df = pd.read_csv("data.csv")
data_df.columns = [c.lower() for c in data_df.columns]
print(data_df.shape)

A typical row in data.csv contains:
- the sample identifier
- one column for each gene expression value
To load it into the gene_expression table, the dataset is unpivoted.
id_col = "Unnamed: 0".lower()
long_df = data_df.melt(
id_vars=[id_col],
var_name="gene_id",
value_name="expression_value"
)
long_df.rename(columns={id_col: "sample_id"}, inplace=True)
print(long_df.head())

After transformation, each row represents a single gene expression measurement for one sample.
Example structure:
sample_idgene_idexpression_value
These rows can then be inserted into Oracle.
insert_expr_sql = """
INSERT INTO gene_expression (sample_id, gene_id, expression_value)
VALUES (:1, :2, :3)
"""
expr_rows = list(long_df.itertuples(index=False, name=None))
cursor.executemany(insert_expr_sql, expr_rows)
connection.commit()

At this point, the dataset has been embedded into the Oracle database in a form that supports SQL queries, analytics, and downstream machine learning workflows.
Retrieving the Dataset from Oracle for Analysis
Once the data is stored in Oracle, Python can query it directly for preprocessing and analysis.
query = """
SELECT g.sample_id,
p.cancer_type,
g.gene_id,
g.expression_value
FROM gene_expression g
JOIN patient_labels p
ON g.sample_id = p.sample_id
"""
df_long = pd.read_sql(query, connection)
print(df_long.head())

Because most machine learning algorithms expect a matrix format, the long dataset is pivoted back into a wide analytical matrix.
df_matrix = df_long.pivot(
index="SAMPLE_ID",
columns="GENE_ID",
values="EXPRESSION_VALUE"
)
labels_from_db = pd.read_sql(
"SELECT sample_id, cancer_type FROM patient_labels",
connection
)
df_matrix = df_matrix.merge(
labels_from_db,
left_index=True,
right_on="SAMPLE_ID"
)

This creates a dataset that can be used for feature scaling, dimensionality reduction, and clustering.
Python Analytics Pipeline
After loading the data from Oracle, the machine learning workflow proceeds in Python.
Standardisation
Gene expression values vary considerably across genes, so the features are standardised before analysis.
from sklearn.preprocessing import StandardScaler
X = df_matrix.drop(columns=["SAMPLE_ID", "CANCER_TYPE"])
y = df_matrix["CANCER_TYPE"]
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
Principal Component Analysis (PCA)
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
pca_results = pd.DataFrame(X_pca, columns=["pca1", "pca2"])
pca_results["sample_id"] = df_matrix["SAMPLE_ID"].values
pca_results["CANCER_TYPE"] = y.values

t-SNE
from sklearn.manifold import TSNE
tsne = TSNE(n_components=2, learning_rate=50, random_state=42)
X_tsne = tsne.fit_transform(X_scaled)
tsne_results = pd.DataFrame(X_tsne, columns=["tsne_x", "tsne_y"])
tsne_results["SAMPLE_ID"] = df_matrix["SAMPLE_ID"].values
tsne_results["CANCER_TYPE"] = y.values

K-Means Clustering
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=5, n_init=10, random_state=42)
clusters = kmeans.fit_predict(X_pca)
cluster_results = pd.DataFrame({
"sample_id": df_matrix["SAMPLE_ID"].values,
"cancer_type": y.values,
"cluster_id": clusters
})

Writing the Analytical Results Back to Oracle
To support Oracle APEX dashboards, the analytical results are stored back in the database (under the ai_healthcare user).
PCA results table
CREATE TABLE pca_results (
sample_id VARCHAR2(50) PRIMARY KEY,
cancer_type VARCHAR2(10),
pca1 NUMBER,
pca2 NUMBER
);
t-SNE results table
CREATE TABLE tsne_results (
sample_id VARCHAR2(50) PRIMARY KEY,
cancer_type VARCHAR2(10),
tsne_x NUMBER,
tsne_y NUMBER
);
Cluster results table
CREATE TABLE cluster_results (
sample_id VARCHAR2(50) PRIMARY KEY,
cancer_type VARCHAR2(10),
cluster_id NUMBER
);
The Python results can then be inserted into Oracle.
# Clear tables first
cursor.execute("TRUNCATE TABLE pca_results")
cursor.execute("TRUNCATE TABLE tsne_results")
cursor.execute("TRUNCATE TABLE cluster_results")
cursor.executemany(
"INSERT INTO pca_results (sample_id, cancer_type, pca1, pca2) VALUES (:1, :2, :3, :4)",
list(pca_results[["sample_id", "CANCER_TYPE", "pca1", "pca2"]].itertuples(index=False, name=None))
)
cursor.executemany(
"INSERT INTO tsne_results (sample_id, CANCER_TYPE, tsne_x, tsne_y) VALUES (:1, :2, :3, :4)",
list(tsne_results[["SAMPLE_ID", "CANCER_TYPE", "tsne_x", "tsne_y"]].itertuples(index=False, name=None))
)
cursor.executemany(
"INSERT INTO cluster_results (sample_id, cancer_type, cluster_id) VALUES (:1, :2, :3)",
list(cluster_results[["sample_id", "cancer_type", "cluster_id"]].itertuples(index=False, name=None))
)
connection.commit()

This step is what makes the Oracle APEX dashboards possible: the charts are not reading directly from the notebook, but from result tables persisted in Oracle.
Connecting Oracle APEX through Oracle REST Data Services (ORDS)
In earlier Oracle container deployments, database administration tools were accessed through Enterprise Manager Database Express using the endpoint:
http://localhost:5500/em
However, in Oracle AI Database 26ai, Enterprise Manager Database Express has been desupported Oracle Corporation (2026). As a result, the /em endpoint may not respond even when the database container is running correctly.
Instead, modern Oracle database deployments expose web-based tools through Oracle REST Data Services (ORDS). ORDS acts as the HTTP gateway between web applications and the Oracle database and enables access to Oracle Application Express (APEX) and REST APIs.
After installing ORDS and linking it to the database service FREEPDB1, the service becomes available through a local web server.
Accessing Oracle APEX through ORDS
Once ORDS is running, the APEX environment can be accessed through the browser using the following URL:
http://localhost:8080/ords

From this interface, developers can log in using the APEX administrator account and create workspaces connected to database schemas
However, when APEX is not installed inside the pluggable database, the APEX tile may display “App Unavailable.”
To enable APEX, the Oracle APEX schema must be installed in the FREEPDB1 pluggable database.
Installing Oracle APEX in the Database
First connect to the database container:
docker exec -it oracledb sqlplus sys/XXXXXXXX@localhost:1521/FREEPDB1 as sysdba
Then run:
@apexins.sql SYSAUX SYSAUX TEMP /i/

SYS> SHOW CON_NAME;
CON_NAME
------------------------------
FREEPDB1
SYS> SELECT comp_id, comp_name, version, status
FROM dba_registry
WHERE comp_id = 'APEX'; 2 3
COMP_ID
------------------------------
COMP_NAME
--------------------------------------------------------------------------------
VERSION STATUS
------------------------------ -----------
APEX
Oracle APEX
24.2.0 VALID

Although Oracle APEX 24.2 was successfully installed and validated in the FREEPDB1 pluggable database, the APEX endpoint initially returned HTTP 404 through ORDS. Oracle documents that when ORDS is installed before APEX, ORDS can create placeholder metadata instead of binding to the real APEX schema. The issue was resolved by configuring the database pool with plsql.gateway.mode=proxied, running ords install repair, and ensuring that standalone.static.path pointed to the downloaded APEX images directory.



Connecting APEX to the AI_HEALTHCARE Schema
Inside the APEX administration interface, a workspace can be created to connect the APEX application environment to the schema used in the machine learning pipeline.
Inside APEX:
- Login as ADMIN
- Go to Manage Workspaces
- Create workspace
Workspace name: AI_HEALTHCARE_WS





Once the workspace is created, log into Oracle APEX using the workspace credentials.

Creating an APEX Application
After logging into the APEX workspace:
- Navigate to App Builder
- Click Create Application
- Provide a name:

- Add pages for visualisation, including:
- Interactive reports

- Charts
- Scatter plots
- Cluster visualisations
APEX will automatically use the database schema associated with the workspace.
Creating Dashboard Queries
Each dashboard component is powered by a SQL query executed directly against the Oracle database.
Dataset Overview
This report shows the number of tumour samples per cancer type.
SELECT cancer_type, COUNT(*) AS sample_count
FROM patient_labels
GROUP BY cancer_type
ORDER BY sample_count DESC;

In APEX, configure the chart as:
label: CANCER_TYPE
VALUE: SAMPLE_COUNT

PCA Visualization
Principal Component Analysis (PCA) is used to reduce the dimensionality of the gene expression dataset while preserving as much variance as possible (Jolliffe, 2002).
This scatter plot visualises the first two principal components computed in the Python pipeline.
SELECT pca1,
pca2,
cancer_type
FROM pca_results;
In APEX, configure the chart as:
Chart type: Scatter Plot
X-axis: PCA1
Y-axis: PCA2
Group by: CANCER_TYPE



Each color/shape represents a different cancer type, and the axes represent the first two principal components from the dimensionality reduction pipeline.

The scatter plot displays tumor samples projected onto the first two principal components derived from the gene expression dataset. Each point represents a sample, colored according to its cancer type. The PCA transformation reveals clustering patterns among tumor classes. In particular, kidney renal clear cell carcinoma (KIRC) forms a well-separated cluster, indicating strong differentiation in the reduced feature space, while other cancer types exhibit partial overlap.
t-SNE Cluster Visualization
To better visualize local structure among tumour samples, I applied the t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm (van der Maaten & Hinton, 2008).
This chart visualises tumour clusters using t-SNE coordinates.
SELECT
tsne_x,
tsne_y,
cancer_type
FROM tsne_results



Benefits of Using Oracle APEX for Biomedical Analytics
Integrating APEX into the architecture provides several advantages:
Interactive data exploration
Researchers can explore tumour clusters visually without running Python code.
Real-time database queries
APEX dashboards always reflect the latest results stored in Oracle.
Rapid development
Low-code tools allow dashboards to be created quickly using SQL queries.
Enterprise integration
APEX applications can be integrated with hospital systems, research databases, and clinical analytics platforms.
Final Architecture Overview
The complete Version 2 architecture consists of:
- Python analytics pipeline
- Data preprocessing
- PCA
- t-SNE
- Oracle AI Database 26ai
- Stores genomic data
- Stores machine learning results
- Oracle APEX dashboards
- Visualisation
- Interactive exploration
- Research reporting
Github: https://github.com/brunorsreis/-cancer-gene-expression-classification-version2
This architecture demonstrates how machine learning workflows can evolve into a scalable platform combining AI analytics, enterprise databases, and interactive data applications.
References
Hastie, T., Tibshirani, R., & Friedman, J. (2009).
The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer.
Jolliffe, I. T. (2025).
Principal component analysis (2nd ed.). Springer.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Duchesnay, E. (2011).
Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Van der Maaten, L., & Hinton, G. (2008).
Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.
Merkel, D. (2014).
Docker: Lightweight Linux containers for consistent development and deployment. Linux Journal.
Oracle Corporation. (2026).
Oracle AI Database Documentation. Release 19 .Database Upgrade GuideOracle.10 Oracle Database Changes, Desupports, and Deprecations


*The views expressed here are my own and do not represent those of my employer.*
Hello, I’m Bruno — a dual citizen of Brazil and Sweden. I bring a global perspective shaped by experiences in both South America and Europe, with a strong focus on collaboration and innovation across cultures. I am a Computer Scientist, PhD Candidate in Information and Communication Technologies, focusing on Data Science and Artificial Intelligence, and hold dual Master’s degrees in Data Science and Cybersecurity. With over fifteen years of international experience spanning Brazil, Hungary, and Sweden, I have collaborated with global organizations such as IBM, Playtech, and Oracle, as well as contributed remotely to projects across multiple regions. My professional interests include Databases, Cybersecurity, Cloud Computing, Data Science, Data Engineering, Big Data, Artificial Intelligence, Programming, and Software Engineering, all driven by a deep passion for transforming data into strategic business value.