Table of Contents Version 1:Conceptualising an AI System for Cancer Type Classification Using Gene Expression Data Conceptualising an AI System for Cancer Type Classification Using Gene Expression Data Version 2: Conceptualising an AI System for Cancer Type Classification Using Gene Expression Data – Version 2 — Integrating Oracle AI Database 26ai and Oracle APEX Github: https://github.com/brunorsreis/–cancer-gene-expression-classification-version3/ 1. Introduction Version 2 already stores the gene expression dataset in Oracle, runs Python analytics such as PCA, t-SNE, and K-means, and writes the analytical outputs back into Oracle for Oracle APEX dashboards. Version…
Category: Data Analytics (Hadoop/Streaming Data/Machine Learning/Data and information visualization/Big Data)
Conceptualising an AI System for Cancer Type Classification Using Gene Expression Data – Version 2 — Integrating Oracle AI Database 26ai and Oracle APEX
Introduction Artificial intelligence (AI) is increasingly transforming healthcare by enabling the analysis of complex biomedical datasets and supporting advances in precision medicine. One important application is the classification of cancer types using gene expression data. Gene expression datasets measure the activity of thousands of genes simultaneously, allowing researchers to identify molecular patterns associated with specific tumour types. These datasets are typically extremely high dimensional and require advanced machine learning techniques to extract meaningful insights. The dataset used in this project contains approximately 802 tumour samples and more than 20,000 gene…
Conceptualising an AI System for Cancer Type Classification Using Gene Expression Data
Introduction Artificial intelligence (AI) is transforming healthcare by enabling advanced analysis of complex biomedical data and supporting clinical decision-making. Traditional statistical methods often struggle with high-dimensional datasets, while machine learning can identify patterns directly from large amounts of data. According to Abtahi and Astaraki (2026), AI is particularly valuable in healthcare when analysing large datasets where traditional methods are limited. One important application of AI is cancer classification using gene expression data. These datasets measure the activity of thousands of genes simultaneously, helping researchers identify molecular patterns linked to specific…
Building a Semantic Search API with MySQL Vector Search, Oracle Cloud, and an NBA Kaggle Dataset
Semantic search enables users to ask for “a 3-and-D wing who can guard multiple positions” (as described in The Kings Beat article) and retrieve the correct NBA players, rather than simply a keyword match on “3,” “wing,” or “defense.” By combining MySQL Vector Search, Oracle Cloud’s Generative AI Embeddings, and a real NBA dataset from Kaggle, you can build a powerful natural-language search API that understands the meaning behind a query.As I love sports, I think this would be a great demonstration. In this tutorial, you’ll build: We’ll use the…
From Stockholm Marathon to Multi-cloud (OCI, AWS, Azure, GCP) Strategy
Running a marathon is a monumental undertaking. Even though I had been preparing for 1.8 years, it wasn’t until a few weeks ago that I truly grasped the significance of this feat. The turning point came when I made the decision to run for a charity organization in Brazil, specifically Casa da Crianca Paralítica de Campinas (https://www.ccp.org.br/web/). This organization provides vital medical, dental, and pedagogical support to children with physical and mental disabilities in Campinas, Sao Paulo State, Brazil. Moreover, during my preparations, I delved into the origin of the…
EXAM PREPARATION – AWS Certified Machine Learning – Specialty
I have been studying not only for this certification of Machine Learning but for other in Data Engineering since I started my master’s studies in Data Science a few years ago. And the reason behind that is that I was already working with the AWS console and was overwhelmed with the changing data management I saw while a database administrator. I still remember everyone talking about this Machine Learning thing, and I had no clue back then what it was. I was seen many database administrators mistakenly changing their role…
Transforming .CSV format file in Apache Parquet format via Amazon Glue
I had the task of storing a .CSV format file in an Amazon S3 storage, but the pre-requisite was converting this .CSV format file to Apache Parquet format. Thinking from this perspective, I assume that one of the reasons for this conversion is to reduce costs, and for that, the tool Amazon Glue is the perfect tool to accomplish this task. I thought to describe and show my implementation here, but I realized that there are many how-tos that I followed, and their link is shared below as a source:…
Exam Preparation – Google Cloud Certified – Professional Data Engineer
Google Professional Data Engineer is one of the most exciting and interesting exams I was aiming to pass. Over my career, I have worked mainly with database management until I realized something had happened – The RDBMS databases were no longer the inevitable choice for data management. As a Data Engineer, one is expected to work still with databases and streaming technologies, microservices, third-party APIs, data pipelines, big data technologies, etc. And back then, I had no idea what was happening. Therefore, what does one do when do not know…
Exam Preparation – Google Cloud Certified – Professional Cloud Database Engineer
I have aimed to take a Google certification since starting my career. However, I only had the opportunity to work with Google’s products in the last 3 years, when I began working with GCP Cloud. Therefore, I started my preparation in the segment that is my domain: databases. And, mainly because I wanted to go deeper into a product that as Athena for AWS and Oracle as RDBMS, it became one of my favorite products to work with, which is Google’s fully managed, serverless data warehouse named BigQuery. To my…
Exam Preparation – AWS Certified Data Analytics – Specialty
Many people don’t know that the idea of learning from data is not something from the current decade. It was pointed out over 50 years ago by John Tukey in “The Future of Data Analysis” (Donoho, D. (2017)). More and more companies have a Data Warehouse (Data Lake or both – Data Lakehouse) that can handle a massive amount of data, such as 900 TB, and has data imported constantly with SQL-like queries and operators accessing it over the day. What is the reason behind it? Identifying and visualizing patterns…