Conceptualising an AI System for Cancer Type Classification Using Gene Expression Data

Introduction
Dataset Description
Data Preparation and Exploratory Analysis
System Setup and Infrastructure
4.1 Installing Oracle AI Database 26ai with Docker
4.2 Creating the Application User
4.3 Creating the Database Schema
4.4 Loading the Dataset into Oracle Database
Python Analytics Pipeline
Writing Analytical Results Back to Oracle
Oracle APEX Integration
7.1 Connecting Oracle APEX to the Database
7.2 Oracle APEX Analytical Dashboards
Adding a Claude Chatbox for Natural Language Queries (Version 3)
Advantages of the Architecture
Conclusion

Version 1:Conceptualising an AI System for Cancer Type Classification Using Gene Expression Data Conceptualising an AI System for Cancer Type Classification Using Gene Expression Data

Version 2: Conceptualising an AI System for Cancer Type Classification Using Gene Expression Data – Version 2 — Integrating Oracle AI Database 26ai and Oracle APEX

Github: https://github.com/brunorsreis/–cancer-gene-expression-classification-version3/

1. Introduction

Version 2 already stores the gene expression dataset in Oracle, runs Python analytics such as PCA, t-SNE, and K-means, and writes the analytical outputs back into Oracle for Oracle APEX dashboards.

Version 3 extends this architecture by adding an interactive chatbox powered by Claude. The goal is to allow researchers, clinicians, or analysts to ask questions in natural language such as:

“How many tumour samples are there for each cancer type?”
“Show the PCA distribution for BRCA samples.”
“Which cancer type forms the most distinct cluster?”
“What genes are most highly expressed for KIRC samples?”
“Summarise the clustering results in plain English.”

Instead of only navigating dashboards, users can now interact conversationally with the data.

2. Version 3 Architecture Overview

The Version 3 architecture contains four layers:

1. Oracle AI Database 26ai

Stores:
- patient_labels
- gene_expression
- pca_results
- tsne_results
- cluster_results

2. Python Middleware

Receives the user’s question from the chatbox
Sends the question to Claude
Uses prompt instructions to convert natural language into SQL or analytical intent
Executes SQL against Oracle
Returns results to the user interface

3. Claude Chat Layer

Interprets natural language questions
Generates safe SQL for approved tables
Produces concise scientific explanations of the returned results

4. Oracle APEX Front End

Existing dashboards remain available
A new chat region is added to the APEX application
Users can ask free-text questions and receive tabular or narrative answers

This is a natural progression from Version 2, which already uses Oracle as the central store for both raw and analytical results.

3. New User Experience

In Version 2, users explore results through charts and reports in APEX. In Version 3, they can additionally use a chat assistant.

Example conversation:

User:
How many samples are available for each cancer type?

Claude-powered assistant:
There are 5 cancer types in the dataset. The counts per type are retrieved from patient_labels using a grouped SQL query.

User:
Which cancer type appears most separated in PCA space?

Claude-powered assistant:
Based on the PCA results, KIRC appears to form the most distinct cluster, with less overlap than the other tumour classes. This matches the interpretation already described in the Version 2 PCA visualization.

4. Adding a Chatbox to Oracle APEX

A new page can be added to the APEX application:

Page type: Blank Page or Region-based page
Regions:
- Chat history region
- Text input item
- Send button
- Response display region

Suggested APEX items:

P_CHAT_INPUT — user question
P_CHAT_RESPONSE — assistant answer
P_SQL_DEBUG — optional generated SQL for administrators

The APEX page sends the text input to a Python API endpoint, which calls Claude and returns the answer.

5. Python API Layer for Claude Integration

A lightweight Python service can sit between APEX and Claude.

Example using Flask:

from flask import Flask, request, jsonify, render_template_string
import os
import re
import json
import oracledb
import anthropic

app = Flask(__name__)

# -----------------------------
# Configuration
# -----------------------------
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY", "YOUR_ANTHROPIC_API_KEY")

DB_USER = os.getenv("ORACLE_DB_USER", "YOUR_DATABASE_USER")
DB_PASSWORD = os.getenv("ORACLE_DB_PASSWORD", "YOUR_DATABASE_PASSWORD")
DB_DSN = os.getenv("ORACLE_DB_DSN", "localhost:1521/FREEPDB1")

client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)

SYSTEM_PROMPT = """
You are a biomedical data assistant.
You help users query an Oracle database containing cancer gene expression data.

Available tables:
- patient_labels(sample_id, cancer_type)
- gene_expression(sample_id, gene_id, expression_value)
- pca_results(sample_id, cancer_type, pca1, pca2)
- tsne_results(sample_id, cancer_type, tsne_x, tsne_y)
- cluster_results(sample_id, cancer_type, cluster_id)

Rules:
- Generate only read-only SQL.
- Never use INSERT, UPDATE, DELETE, DROP, ALTER, TRUNCATE, MERGE, GRANT, REVOKE.
- Use only the available tables listed above.
- Prefer concise Oracle SQL.
- If asked for SQL, return SQL only.
- When given query results, explain them clearly for a biomedical audience.
"""

HTML_PAGE = """
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <title>Claude Cancer Chat</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            margin: 40px;
            max-width: 900px;
        }
        textarea, input[type="text"] {
            width: 100%;
            padding: 10px;
            font-size: 16px;
        }
        button {
            padding: 10px 18px;
            font-size: 16px;
            margin-top: 10px;
            cursor: pointer;
        }
        .box {
            border: 1px solid #ccc;
            border-radius: 8px;
            padding: 16px;
            margin-top: 20px;
            background: #fafafa;
        }
        pre {
            white-space: pre-wrap;
            word-wrap: break-word;
            background: #f4f4f4;
            padding: 12px;
            border-radius: 6px;
            overflow-x: auto;
        }
        .error {
            color: #b00020;
            font-weight: bold;
        }
    </style>
</head>
<body>
    <h2>Claude Cancer Chat</h2>
    <p>Ask a question about the cancer gene expression dataset.</p>

    <form method="post" action="/ask">
        <input
            type="text"
            name="message"
            placeholder="How many samples are there for each cancer type?"
            value="{{ message|default('') }}"
            required
        />
        <button type="submit">Send</button>
    </form>

    {% if error %}
    <div class="box">
        <div class="error">Error</div>
        <pre>{{ error }}</pre>
    </div>
    {% endif %}

    {% if answer %}
    <div class="box">
        <h3>Answer</h3>
        <pre>{{ answer }}</pre>
    </div>
    {% endif %}

    {% if sql %}
    <div class="box">
        <h3>Generated SQL</h3>
        <pre>{{ sql }}</pre>
    </div>
    {% endif %}

    {% if result %}
    <div class="box">
        <h3>Query Result</h3>
        <pre>{{ result }}</pre>
    </div>
    {% endif %}
</body>
</html>
"""

# -----------------------------
# Database helpers
# -----------------------------
def get_connection():
    return oracledb.connect(
        user=DB_USER,
        password=DB_PASSWORD,
        dsn=DB_DSN
    )

def extract_sql(text: str) -> str:
    """
    Extract SQL from Claude response.
    Supports plain SQL or fenced ```sql blocks.
    """
    if not text:
        return ""

    fence_match = re.search(r"```sql\s*(.*?)```", text, re.IGNORECASE | re.DOTALL)
    if fence_match:
        return fence_match.group(1).strip()

    generic_fence_match = re.search(r"```\s*(.*?)```", text, re.DOTALL)
    if generic_fence_match:
        return generic_fence_match.group(1).strip()

    return text.strip()

def validate_sql(sql: str) -> None:
    if not sql:
        raise ValueError("Claude returned an empty SQL query.")

    sql_clean = sql.strip().rstrip(";")
    sql_upper = sql_clean.upper()

    forbidden = [
        "INSERT", "UPDATE", "DELETE", "DROP", "ALTER", "TRUNCATE",
        "MERGE", "GRANT", "REVOKE", "CREATE", "RENAME", "BEGIN", "DECLARE"
    ]

    if not sql_upper.startswith("SELECT"):
        raise ValueError("Only SELECT statements are allowed.")

    for keyword in forbidden:
        if re.search(rf"\b{keyword}\b", sql_upper):
            raise ValueError(f"Forbidden SQL keyword detected: {keyword}")

def run_sql(sql: str, max_rows: int = 50) -> dict:
    validate_sql(sql)

    connection = get_connection()
    try:
        cursor = connection.cursor()
        cursor.execute(sql)
        columns = [col[0] for col in cursor.description] if cursor.description else []
        rows = cursor.fetchmany(max_rows)
        return {
            "columns": columns,
            "rows": rows,
            "row_count_returned": len(rows)
        }
    finally:
        connection.close()

# -----------------------------
# LLM helpers
# -----------------------------
def generate_sql(user_question: str) -> str:
    sql_prompt = f"""
Convert this question into a safe Oracle SQL query.

Question:
{user_question}

Return SQL only.
"""

    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=400,
        system=SYSTEM_PROMPT,
        messages=[{"role": "user", "content": sql_prompt}]
    )

    text = "".join(
        block.text for block in response.content if hasattr(block, "text")
    ).strip()

    sql = extract_sql(text)
    validate_sql(sql)
    return sql

def explain_result(user_question: str, sql_query: str, result: dict) -> str:
    explanation_prompt = f"""
User question:
{user_question}

SQL used:
{sql_query}

Query result:
{json.dumps(result, default=str, indent=2)}

Provide a concise explanation of the answer in plain English.
"""

    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=500,
        system=SYSTEM_PROMPT,
        messages=[{"role": "user", "content": explanation_prompt}]
    )

    answer = "".join(
        block.text for block in response.content if hasattr(block, "text")
    ).strip()

    return answer

def process_question(user_question: str) -> dict:
    if not user_question or not user_question.strip():
        raise ValueError("Question is empty.")

    sql_query = generate_sql(user_question)
    result = run_sql(sql_query)
    answer = explain_result(user_question, sql_query, result)

    return {
        "sql": sql_query,
        "result": result,
        "answer": answer
    }

# -----------------------------
# Routes
# -----------------------------
@app.route("/", methods=["GET"])
def home():
    return render_template_string(HTML_PAGE)

@app.route("/ask", methods=["POST"])
def ask():
    message = request.form.get("message", "").strip()

    try:
        output = process_question(message)
        return render_template_string(
            HTML_PAGE,
            message=message,
            answer=output["answer"],
            sql=output["sql"],
            result=json.dumps(output["result"], default=str, indent=2)
        )
    except Exception as e:
        return render_template_string(
            HTML_PAGE,
            message=message,
            error=str(e)
        ), 500

@app.route("/chat", methods=["POST"])
def chat():
    try:
        data = request.get_json(silent=True) or {}
        user_question = (data.get("message") or "").strip()
        output = process_question(user_question)
        return jsonify(output)
    except Exception as e:
        return jsonify({"error": str(e)}), 500

# -----------------------------
# App start
# -----------------------------
if __name__ == "__main__":
    app.run(host="127.0.0.1", port=5001, debug=True, use_reloader=False)

6. Example Queries Claude Could Generate

For a question like:

“How many samples are there for each cancer type?”

Claude could generate:

SELECT cancer_type, COUNT(*) AS sample_count
FROM patient_labels
GROUP BY cancer_type
ORDER BY sample_count DESC

That aligns with the dataset overview query already used in APEX dashboards in Version 2.

After the purchase of the credits, it took a short while for Claude to reflect the updated balance in the system. During this period, API requests may still return errors related to insufficient credit, even though the billing dashboard shows an available balance. This delay appears to be due to synchronization between the billing system and API access layer, and it typically resolves automatically after a few minutes without further intervention.

For:

“Show PCA coordinates for LUAD samples.”

SELECT sample_id, pca1, pca2
FROM pca_results
WHERE cancer_type = 'LUAD'
ORDER BY sample_id

…..

For:

“How many samples are in each cluster?”

SELECT cluster_id, COUNT(*) AS sample_count
FROM cluster_results
GROUP BY cluster_id
ORDER BY cluster_id

7. APEX Call to the Chat API

From Oracle APEX, the Send button can call the Python endpoint using a REST request. The response can then be displayed in a page region.

Conceptually:

User types question in P_CHAT_INPUT
APEX sends POST request to /chat
API returns:
- answer
- SQL
- result rows
APEX renders the answer in the chat history region

This keeps the UI in APEX while letting Claude handle natural language interpretation.

8. Recommended Safety Controls

Because the chatbox generates SQL dynamically, Version 3 should include guardrails:

Restrict to SELECT-only
Allow queries only on approved tables
Limit result size
Log generated SQL
Validate SQL before execution
Block schema-changing statements
Optionally add a whitelist of permitted columns

This is especially important because the database already contains both raw and analytical outputs.

9. Benefits of Version 3

Version 3 adds major value to the Version 2 platform:

Conversational analytics
Users ask questions in plain language instead of writing SQL.
Improved accessibility
Non-technical users can explore the dataset without needing Python or database knowledge.
Faster insight discovery
Researchers can move from static dashboards to interactive question answering.
Natural extension of APEX
The dashboard system remains intact while gaining an AI interaction layer.
Bridging BI and LLMs
Oracle remains the structured data source, while Claude becomes the reasoning and explanation interface.

10. Final Version 3 Summary

Version 2 already established a strong architecture with:

Oracle AI Database 26ai for storage
Python for machine learning
Oracle APEX for dashboards

Version 1: Python-only analytics
Version 2: Python + Oracle AI Database + APEX dashboards
Version 3: Python + Oracle AI Database + APEX dashboards + Claude chat assistant

Conclusion

Version 3 represents a significant evolution of the system, transforming it from a structured analytics platform into a fully interactive AI-driven exploration tool.

While Version 2 successfully integrated Oracle AI Database, Python analytics, and Oracle APEX dashboards, Version 3 introduces a conversational interface powered by Claude, enabling users to query complex biomedical data using natural language.

This enhancement not only improves usability but also democratizes access to advanced analytics, allowing both technical and non-technical users to extract meaningful insights from gene expression data.

By combining:

robust data storage (Oracle AI Database),
advanced analytics (Python),
intuitive visualisation (APEX), and
intelligent interaction (Claude),

the system becomes a comprehensive, scalable, and user-friendly AI platform for cancer data analysis.

Future enhancements could include:

integration with predictive models
real-time data ingestion
personalised clinical insights
multi-modal data support (e.g., imaging + genomics)

Version 3 therefore lays the foundation for a new generation of AI-assisted biomedical analytics systems.

brunors

*The views expressed here are my own and do not represent those of my employer.*

Hello, I’m Bruno — a dual citizen of Brazil and Sweden. I bring a global perspective shaped by experiences in both South America and Europe, with a strong focus on collaboration and innovation across cultures. I am a Computer Scientist, PhD Candidate in Information and Communication Technologies, focusing on Data Science and Artificial Intelligence, and hold dual Master’s degrees in Data Science and Cybersecurity. With over fifteen years of international experience spanning Brazil, Hungary, and Sweden, I have collaborated with global organizations such as IBM, Playtech, and Oracle, as well as contributed remotely to projects across multiple regions. My professional interests include Databases, Cybersecurity, Cloud Computing, Data Science, Data Engineering, Big Data, Artificial Intelligence, Programming, and Software Engineering, all driven by a deep passion for transforming data into strategic business value.

Conceptualising an AI System for Cancer Type Classification Using Gene Expression Data – Version 3 – Integrating Oracle AI Database 26ai, Oracle APEX, and a Claude-Powered Chat Interface

Table of Contents

1. Introduction

2. Version 3 Architecture Overview

3. New User Experience

4. Adding a Chatbox to Oracle APEX

5. Python API Layer for Claude Integration

6. Example Queries Claude Could Generate

7. APEX Call to the Chat API

8. Recommended Safety Controls

9. Benefits of Version 3

10. Final Version 3 Summary

Conclusion

Related posts

Table of Contents

1. Introduction

2. Version 3 Architecture Overview

3. New User Experience

4. Adding a Chatbox to Oracle APEX

5. Python API Layer for Claude Integration

6. Example Queries Claude Could Generate

7. APEX Call to the Chat API

8. Recommended Safety Controls

9. Benefits of Version 3

10. Final Version 3 Summary

Conclusion

Related posts

Conceptualising an AI System for Cancer Type Classification Using Gene Expression Data – Version 2 — Integrating Oracle AI Database 26ai and Oracle APEX

Supercharging Your Application with Redis + Aurora PostgreSQL: A Practical Guide from Local Setup to Full AWS Deployment