Table of Contents
- Introduction
- Dataset Description
- Data Preparation and Exploratory Analysis
- System Setup and Infrastructure
4.1 Installing Oracle AI Database 26ai with Docker
4.2 Creating the Application User
4.3 Creating the Database Schema
4.4 Loading the Dataset into Oracle Database - Python Analytics Pipeline
- Writing Analytical Results Back to Oracle
- Oracle APEX Integration
7.1 Connecting Oracle APEX to the Database
7.2 Oracle APEX Analytical Dashboards - Adding a Claude Chatbox for Natural Language Queries (Version 3)
- Advantages of the Architecture
- Conclusion
Github: https://github.com/brunorsreis/–cancer-gene-expression-classification-version3/
1. Introduction
Version 2 already stores the gene expression dataset in Oracle, runs Python analytics such as PCA, t-SNE, and K-means, and writes the analytical outputs back into Oracle for Oracle APEX dashboards.
Version 3 extends this architecture by adding an interactive chatbox powered by Claude. The goal is to allow researchers, clinicians, or analysts to ask questions in natural language such as:
- “How many tumour samples are there for each cancer type?”
- “Show the PCA distribution for BRCA samples.”
- “Which cancer type forms the most distinct cluster?”
- “What genes are most highly expressed for KIRC samples?”
- “Summarise the clustering results in plain English.”
Instead of only navigating dashboards, users can now interact conversationally with the data.
2. Version 3 Architecture Overview
The Version 3 architecture contains four layers:
1. Oracle AI Database 26ai
- Stores:
patient_labelsgene_expressionpca_resultstsne_resultscluster_results
2. Python Middleware
- Receives the user’s question from the chatbox
- Sends the question to Claude
- Uses prompt instructions to convert natural language into SQL or analytical intent
- Executes SQL against Oracle
- Returns results to the user interface
3. Claude Chat Layer
- Interprets natural language questions
- Generates safe SQL for approved tables
- Produces concise scientific explanations of the returned results
4. Oracle APEX Front End
- Existing dashboards remain available
- A new chat region is added to the APEX application
- Users can ask free-text questions and receive tabular or narrative answers
This is a natural progression from Version 2, which already uses Oracle as the central store for both raw and analytical results.
3. New User Experience
In Version 2, users explore results through charts and reports in APEX. In Version 3, they can additionally use a chat assistant.
Example conversation:
User:
How many samples are available for each cancer type?
Claude-powered assistant:
There are 5 cancer types in the dataset. The counts per type are retrieved from patient_labels using a grouped SQL query.
User:
Which cancer type appears most separated in PCA space?
Claude-powered assistant:
Based on the PCA results, KIRC appears to form the most distinct cluster, with less overlap than the other tumour classes. This matches the interpretation already described in the Version 2 PCA visualization.
4. Adding a Chatbox to Oracle APEX
A new page can be added to the APEX application:
- Page type: Blank Page or Region-based page
- Regions:
- Chat history region
- Text input item
- Send button
- Response display region
Suggested APEX items:
P_CHAT_INPUT— user questionP_CHAT_RESPONSE— assistant answerP_SQL_DEBUG— optional generated SQL for administrators
The APEX page sends the text input to a Python API endpoint, which calls Claude and returns the answer.
5. Python API Layer for Claude Integration
A lightweight Python service can sit between APEX and Claude.
Example using Flask:
from flask import Flask, request, jsonify, render_template_string
import os
import re
import json
import oracledb
import anthropic
app = Flask(__name__)
# -----------------------------
# Configuration
# -----------------------------
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY", "YOUR_ANTHROPIC_API_KEY")
DB_USER = os.getenv("ORACLE_DB_USER", "YOUR_DATABASE_USER")
DB_PASSWORD = os.getenv("ORACLE_DB_PASSWORD", "YOUR_DATABASE_PASSWORD")
DB_DSN = os.getenv("ORACLE_DB_DSN", "localhost:1521/FREEPDB1")
client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)
SYSTEM_PROMPT = """
You are a biomedical data assistant.
You help users query an Oracle database containing cancer gene expression data.
Available tables:
- patient_labels(sample_id, cancer_type)
- gene_expression(sample_id, gene_id, expression_value)
- pca_results(sample_id, cancer_type, pca1, pca2)
- tsne_results(sample_id, cancer_type, tsne_x, tsne_y)
- cluster_results(sample_id, cancer_type, cluster_id)
Rules:
- Generate only read-only SQL.
- Never use INSERT, UPDATE, DELETE, DROP, ALTER, TRUNCATE, MERGE, GRANT, REVOKE.
- Use only the available tables listed above.
- Prefer concise Oracle SQL.
- If asked for SQL, return SQL only.
- When given query results, explain them clearly for a biomedical audience.
"""
HTML_PAGE = """
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Claude Cancer Chat</title>
<style>
body {
font-family: Arial, sans-serif;
margin: 40px;
max-width: 900px;
}
textarea, input[type="text"] {
width: 100%;
padding: 10px;
font-size: 16px;
}
button {
padding: 10px 18px;
font-size: 16px;
margin-top: 10px;
cursor: pointer;
}
.box {
border: 1px solid #ccc;
border-radius: 8px;
padding: 16px;
margin-top: 20px;
background: #fafafa;
}
pre {
white-space: pre-wrap;
word-wrap: break-word;
background: #f4f4f4;
padding: 12px;
border-radius: 6px;
overflow-x: auto;
}
.error {
color: #b00020;
font-weight: bold;
}
</style>
</head>
<body>
<h2>Claude Cancer Chat</h2>
<p>Ask a question about the cancer gene expression dataset.</p>
<form method="post" action="/ask">
<input
type="text"
name="message"
placeholder="How many samples are there for each cancer type?"
value="{{ message|default('') }}"
required
/>
<button type="submit">Send</button>
</form>
{% if error %}
<div class="box">
<div class="error">Error</div>
<pre>{{ error }}</pre>
</div>
{% endif %}
{% if answer %}
<div class="box">
<h3>Answer</h3>
<pre>{{ answer }}</pre>
</div>
{% endif %}
{% if sql %}
<div class="box">
<h3>Generated SQL</h3>
<pre>{{ sql }}</pre>
</div>
{% endif %}
{% if result %}
<div class="box">
<h3>Query Result</h3>
<pre>{{ result }}</pre>
</div>
{% endif %}
</body>
</html>
"""
# -----------------------------
# Database helpers
# -----------------------------
def get_connection():
return oracledb.connect(
user=DB_USER,
password=DB_PASSWORD,
dsn=DB_DSN
)
def extract_sql(text: str) -> str:
"""
Extract SQL from Claude response.
Supports plain SQL or fenced ```sql blocks.
"""
if not text:
return ""
fence_match = re.search(r"```sql\s*(.*?)```", text, re.IGNORECASE | re.DOTALL)
if fence_match:
return fence_match.group(1).strip()
generic_fence_match = re.search(r"```\s*(.*?)```", text, re.DOTALL)
if generic_fence_match:
return generic_fence_match.group(1).strip()
return text.strip()
def validate_sql(sql: str) -> None:
if not sql:
raise ValueError("Claude returned an empty SQL query.")
sql_clean = sql.strip().rstrip(";")
sql_upper = sql_clean.upper()
forbidden = [
"INSERT", "UPDATE", "DELETE", "DROP", "ALTER", "TRUNCATE",
"MERGE", "GRANT", "REVOKE", "CREATE", "RENAME", "BEGIN", "DECLARE"
]
if not sql_upper.startswith("SELECT"):
raise ValueError("Only SELECT statements are allowed.")
for keyword in forbidden:
if re.search(rf"\b{keyword}\b", sql_upper):
raise ValueError(f"Forbidden SQL keyword detected: {keyword}")
def run_sql(sql: str, max_rows: int = 50) -> dict:
validate_sql(sql)
connection = get_connection()
try:
cursor = connection.cursor()
cursor.execute(sql)
columns = [col[0] for col in cursor.description] if cursor.description else []
rows = cursor.fetchmany(max_rows)
return {
"columns": columns,
"rows": rows,
"row_count_returned": len(rows)
}
finally:
connection.close()
# -----------------------------
# LLM helpers
# -----------------------------
def generate_sql(user_question: str) -> str:
sql_prompt = f"""
Convert this question into a safe Oracle SQL query.
Question:
{user_question}
Return SQL only.
"""
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=400,
system=SYSTEM_PROMPT,
messages=[{"role": "user", "content": sql_prompt}]
)
text = "".join(
block.text for block in response.content if hasattr(block, "text")
).strip()
sql = extract_sql(text)
validate_sql(sql)
return sql
def explain_result(user_question: str, sql_query: str, result: dict) -> str:
explanation_prompt = f"""
User question:
{user_question}
SQL used:
{sql_query}
Query result:
{json.dumps(result, default=str, indent=2)}
Provide a concise explanation of the answer in plain English.
"""
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=500,
system=SYSTEM_PROMPT,
messages=[{"role": "user", "content": explanation_prompt}]
)
answer = "".join(
block.text for block in response.content if hasattr(block, "text")
).strip()
return answer
def process_question(user_question: str) -> dict:
if not user_question or not user_question.strip():
raise ValueError("Question is empty.")
sql_query = generate_sql(user_question)
result = run_sql(sql_query)
answer = explain_result(user_question, sql_query, result)
return {
"sql": sql_query,
"result": result,
"answer": answer
}
# -----------------------------
# Routes
# -----------------------------
@app.route("/", methods=["GET"])
def home():
return render_template_string(HTML_PAGE)
@app.route("/ask", methods=["POST"])
def ask():
message = request.form.get("message", "").strip()
try:
output = process_question(message)
return render_template_string(
HTML_PAGE,
message=message,
answer=output["answer"],
sql=output["sql"],
result=json.dumps(output["result"], default=str, indent=2)
)
except Exception as e:
return render_template_string(
HTML_PAGE,
message=message,
error=str(e)
), 500
@app.route("/chat", methods=["POST"])
def chat():
try:
data = request.get_json(silent=True) or {}
user_question = (data.get("message") or "").strip()
output = process_question(user_question)
return jsonify(output)
except Exception as e:
return jsonify({"error": str(e)}), 500
# -----------------------------
# App start
# -----------------------------
if __name__ == "__main__":
app.run(host="127.0.0.1", port=5001, debug=True, use_reloader=False)


6. Example Queries Claude Could Generate
For a question like:
“How many samples are there for each cancer type?”
Claude could generate:
SELECT cancer_type, COUNT(*) AS sample_count
FROM patient_labels
GROUP BY cancer_type
ORDER BY sample_count DESC
That aligns with the dataset overview query already used in APEX dashboards in Version 2.


After the purchase of the credits, it took a short while for Claude to reflect the updated balance in the system. During this period, API requests may still return errors related to insufficient credit, even though the billing dashboard shows an available balance. This delay appears to be due to synchronization between the billing system and API access layer, and it typically resolves automatically after a few minutes without further intervention.


For:
“Show PCA coordinates for LUAD samples.”
SELECT sample_id, pca1, pca2
FROM pca_results
WHERE cancer_type = 'LUAD'
ORDER BY sample_id


…..

For:
“How many samples are in each cluster?”
SELECT cluster_id, COUNT(*) AS sample_count
FROM cluster_results
GROUP BY cluster_id
ORDER BY cluster_id


7. APEX Call to the Chat API
From Oracle APEX, the Send button can call the Python endpoint using a REST request. The response can then be displayed in a page region.
Conceptually:
- User types question in
P_CHAT_INPUT - APEX sends POST request to
/chat - API returns:
- answer
- SQL
- result rows
- APEX renders the answer in the chat history region
This keeps the UI in APEX while letting Claude handle natural language interpretation.
8. Recommended Safety Controls
Because the chatbox generates SQL dynamically, Version 3 should include guardrails:
- Restrict to SELECT-only
- Allow queries only on approved tables
- Limit result size
- Log generated SQL
- Validate SQL before execution
- Block schema-changing statements
- Optionally add a whitelist of permitted columns
This is especially important because the database already contains both raw and analytical outputs.
9. Benefits of Version 3
Version 3 adds major value to the Version 2 platform:
- Conversational analytics
Users ask questions in plain language instead of writing SQL. - Improved accessibility
Non-technical users can explore the dataset without needing Python or database knowledge. - Faster insight discovery
Researchers can move from static dashboards to interactive question answering. - Natural extension of APEX
The dashboard system remains intact while gaining an AI interaction layer. - Bridging BI and LLMs
Oracle remains the structured data source, while Claude becomes the reasoning and explanation interface.
10. Final Version 3 Summary
Version 2 already established a strong architecture with:
- Oracle AI Database 26ai for storage
- Python for machine learning
- Oracle APEX for dashboards
Version 1: Python-only analytics
Version 2: Python + Oracle AI Database + APEX dashboards
Version 3: Python + Oracle AI Database + APEX dashboards + Claude chat assistant
Conclusion
Version 3 represents a significant evolution of the system, transforming it from a structured analytics platform into a fully interactive AI-driven exploration tool.
While Version 2 successfully integrated Oracle AI Database, Python analytics, and Oracle APEX dashboards, Version 3 introduces a conversational interface powered by Claude, enabling users to query complex biomedical data using natural language.
This enhancement not only improves usability but also democratizes access to advanced analytics, allowing both technical and non-technical users to extract meaningful insights from gene expression data.
By combining:
- robust data storage (Oracle AI Database),
- advanced analytics (Python),
- intuitive visualisation (APEX), and
- intelligent interaction (Claude),
the system becomes a comprehensive, scalable, and user-friendly AI platform for cancer data analysis.
Future enhancements could include:
- integration with predictive models
- real-time data ingestion
- personalised clinical insights
- multi-modal data support (e.g., imaging + genomics)
Version 3 therefore lays the foundation for a new generation of AI-assisted biomedical analytics systems.


*The views expressed here are my own and do not represent those of my employer.*
Hello, I’m Bruno — a dual citizen of Brazil and Sweden. I bring a global perspective shaped by experiences in both South America and Europe, with a strong focus on collaboration and innovation across cultures. I am a Computer Scientist, PhD Candidate in Information and Communication Technologies, focusing on Data Science and Artificial Intelligence, and hold dual Master’s degrees in Data Science and Cybersecurity. With over fifteen years of international experience spanning Brazil, Hungary, and Sweden, I have collaborated with global organizations such as IBM, Playtech, and Oracle, as well as contributed remotely to projects across multiple regions. My professional interests include Databases, Cybersecurity, Cloud Computing, Data Science, Data Engineering, Big Data, Artificial Intelligence, Programming, and Software Engineering, all driven by a deep passion for transforming data into strategic business value.