Hands-On Implementation: Building Big Data and Analytics Solutions with Oracle Autonomous AI Database and Oracle Analytics Cloud (OAC)

Modern data analytics projects often take place in shared or restricted Oracle Cloud Infrastructure (OCI) environments, where not all users have administrative privileges. In such cases, demonstrating an end-to-end data pipeline from data ingestion and transformation to visualization can seem challenging.

This post uses Oracle Autonomous AI Database (AADB) , the evolution of Autonomous Data Warehouse (ADW) and ATP, along with Oracle Analytics Cloud (OAC). It highlights how users can still build and test data engineering workflows without elevated IAM permissions or access to Oracle Data Flow or Big Data Service clusters.

The main objective is to demonstrate that even within restricted OCI environments, it is possible to simulate, transform, and analyze data entirely within Oracle’s managed ecosystem, maintaining both security compliance and practical execution.

Why I Did Not Implement the Administrative Permission Fix

During the implementation, when attempting to persist data within Oracle Machine Learning (OML) notebooks, I encountered the following error:

ORA-06598: insufficient INHERIT PRIVILEGES privilege

This issue occurs because OML’s internal schema (PYQSYS) must inherit privileges from the database user (for example, ADMIN) to create tables and database objects on the user’s behalf. The typical solution is to execute SQL grants such as:

%sql
GRANT INHERIT PRIVILEGES ON USER PYQSYS TO ADMIN;
GRANT INHERIT PRIVILEGES ON USER ADMIN TO PYQSYS;
GRANT CREATE TABLE, CREATE VIEW, CREATE SEQUENCE TO ADMIN;
GRANT UNLIMITED TABLESPACE TO ADMIN;
ALTER USER ADMIN QUOTA UNLIMITED ON DATA;

However, because this blog was developed in an OCI environment without administrative permissions, I did not implement these grants.
In enterprise contexts, such privilege modifications are restricted to Cloud or Database Administrators for compliance and tenancy security.

To continue the project without waiting for privilege approval, I adopted a practical and secure workaround using the oml.create() function—part of the modern OML4Py API. This approach bypasses the inheritance dependency and allows creating the target table (SALES_TRANSFORMED) directly from a pandas DataFrame, fully within the Autonomous AI Database environment.

This method not only maintains compatibility with restricted environments but also demonstrates how data engineers and analysts can perform advanced transformations without modifying IAM or database-level grants, making it ideal for corporate testing and proof-of-concept projects.

Step 1: Setting Up the Oracle Cloud Infrastructure (OCI) Environment

Before we begin, ensure you have access to Oracle Cloud Infrastructure (OCI) with the appropriate permissions for Big Data, Object Storage, Data Flow, and OAC.

Setup overview:

1- Create a Compartment for your Big Data and Analytics resources. (Prerequisites: Policy example: Allow group <group_name> to manage compartments in tenancy)

From the top left hamburger menu (☰), go to Identity & Security → Compartments.

Click “Create Compartment”.

2- Configure a Virtual Cloud Network (VCN) with subnets for Big Data nodes and OAC. (Prerequisites: Policy example: Allow group NetworkAdmins to manage virtual-network-family in tenancy)

Open the Navigation Menu (☰) → go to Networking → Virtual Cloud Networks.

Click “Create VCN Wizard.” → Choose “VCN with Internet Connectivity.”

Fill out the form:

Name: BigData-OAC-VCN-demo

Compartment: <Your compartment name>

IPv4 CIDR Block: 10.0.0.0/16 (default is fine unless your org uses a specific range)

Click Next and review the automatically created components:

VCN

Public Subnet (for OAC or public endpoints) and Private Subnet (for Big Data nodes and internal services) (Example below):

Internet Gateway

NAT Gateway

Route Tables

Security Lists

Click Create.

3- Set up Service Gateways for private, secure communication between OCI services.

Open the Navigation Menu (☰) → Networking → Virtual Cloud Networks and then click the existing VCN.

In the left panel, click Service Gateways and then click “Create Service Gateway.”

Fill in the following:

Name: BigData-Service-Gateway

Compartment: Select your

VCN: The existing BigData-OAC-VCN-demo

Under Services, choose:

All <region>-Services-in-Oracle-Services-Network (This allows access to all OCI services in your region, including Object Storage.)

Click Create Service Gateway

4- Ensure identity and access management (IAM) policies are properly configured for your team.

OCI Identity and Access Management (IAM) controls who can access what within your Oracle Cloud tenancy. It lets you define users (individual accounts), groups (logical collections of users), compartments (logical containers for resources) and policies (rules that grant permissions for groups in specific compartments).

In the OCI Console, navigate to Identity & Security → Domains → Your Domain (if new, create one)

→ Groups, and click “Create Group.” Create logical user groups to organize access based on responsibilities — for example, BigDataAdmins for full administrative control over all data and analytics resources, DataEngineers to manage and operate data pipelines and storage, and DataAnalysts with read-only access to analytics tools and the data warehouse.

Once your groups are set up, go to Identity & Security → Domains → Users, click “Create User,” and add your team members (e.g., Alice, Bob, Bruno, Rafael, Laura, Ronaldo, Priya). Thus, assign each user to the appropriate group according to their role to ensure proper access and governance within your OCI environment.

Next, define IAM Policies that specify what each group is allowed to do within your Oracle Cloud environment. In the OCI Console, navigate to Identity & Security → Policies → Create Policy, and write clear, compartment-based rules to grant appropriate access. For example, you can give Admins full control over all resources in your Big Data and Analytics compartment using a policy such as:
Allow group BigDataAdmins to manage all-resources in compartment compartmentNAMEhere.
This ensures administrators have complete authority to configure, monitor, and maintain all services across the data and analytics ecosystem.

Step 2: Provision the Oracle Big Data Environment

Since the tenancy did not include OCPU quotas for Big Data Service and I lacked IAM access to create Oracle Data Flow applications, I chose to simulate the Spark transformation logic using Python and Pandas.

This local simulation reproduces the essential logic of a Data Flow job, filtering, transforming, and scaling data values (to validate the architecture and data model prior to full deployment).

From OCI Console → Navigate to Big Data Service.

To create your Big Data environment, click “Create Cluster” in the OCI Console and choose your preferred Cluster Profile, either Hadoop or Spark (for example), depending on your processing needs.

The technologies listed, Hadoop Extended, Hadoop, HBase, Hive, Spark, Trino, and Kafka, represent key components of the modern big data ecosystem, each with a distinct type and primary use. Hadoop serves as the foundation, providing a distributed data processing framework mainly used for batch processing and large-scale storage through its HDFS (Hadoop Distributed File System) and MapReduce engine. Hadoop Extended is an enhanced version of Hadoop, typically used for optimized performance and management with additional enterprise features. Built on top of Hadoop, HBase is a NoSQL database designed for real-time read and write operations on massive datasets. Hive, on the other hand, acts as a data warehouse system, enabling SQL-like querying and analytics on data stored in Hadoop. Spark is a fast, in-memory processing engine primarily used for real-time analytics, batch processing, and machine learning. Trino, formerly known as PrestoSQL, is a distributed SQL query engine used for querying data from multiple sources without requiring data movement, thereby providing a unified analysis layer. Finally, Kafka is a distributed event streaming platform mainly used for real-time data pipelines and stream processing. Together, these technologies form a cohesive and scalable ecosystem for storing, processing, and analyzing large volumes of data efficiently across both batch and real-time environments.

Specify the Node Count, with a minimum of three nodes recommended for a production-grade setup to ensure scalability and fault tolerance. For secure access, configure Authentication using an SSH key pair. Once all parameters are set, initiate the deployment and wait approximately 20 minutes for the cluster to be fully provisioned and ready for use.

Then I faced…

This message originates from Oracle Cloud Infrastructure (OCI) and indicates that the OCPU (Oracle CPU) quota limit for the selected compute shape (VM.Standard.E4.Flex) has been exceeded. In this case, the current usage is 0 OCPUs, the requested amount is 15 OCPUs, while the maximum allowed limit is 0. Because the account has a limit of zero OCPUs for this shape family, OCI is unable to provision any resources using that instance type.

This issue usually occurs when the tenancy or region does not have an available OCPU quota for the selected shape, the Big Data Service (BDS) resource limit has not been increased, I was using a Free Tier account, which doesn’t support this compute shape.

To resolve the issue, you can request a service limit increase by navigating to the OCI Console → Governance & Administration → Limits, Quotas and Usage, searching for VM.Standard.E4.Flex, and submitting a request specifying the number of OCPUs you need. Alternatively, you can select a different compute shape such as VM.Standard.E3.Flex or VM.Standard2.1, if available. You may also try switching to another region that has quota availability. If you are using a Free Tier account, consider upgrading to a paid account to gain access to larger or more flexible compute shapes.

Since the error message “The BDS Limit: VM.Standard.E4.Flex – Total OCPUs exceeded… max limit: 0” indicates that my account currently has zero quota for Big Data Service compute shapes, preventing me from deploying a full Big Data Service (BDS) cluster. Since only users with tenancy-level privileges (typically Cloud Administrators, not my case in this environment) can request a quota increase, an immediate workaround is to use Oracle Data Flow, a serverless Spark solution that eliminates the need for manual cluster provisioning. Oracle Data Flow allows me to run Spark jobs or data transformation workloads directly in the Oracle Cloud ecosystem, leveraging Object Storage, Autonomous Data Warehouse (ADW), and Oracle Analytics Cloud (OAC). To use it, navigate to Analytics & AI → Data Flow in the Oracle Cloud Console, click Create Application, upload your Spark script (.py or .jar), set the compartment to match your OAC project, choose a small shape (e.g., 2 OCPUs if available), define input and output paths in Object Storage, and click Run Application. This approach provides a scalable, cost-efficient, and serverless environment for running your data pipelines without requiring additional OCPU quotas. In this case, jump to step 3.2

Script example: Spark script (.py) — Replace values and do not forget to upload your file to your Oracle Object Storage bucket and use it in Oracle Data Flow as your Spark test application. I just created a simple script to generate sample data in-memory:

from pyspark.sql import SparkSession
from pyspark.sql.functions import col

# Initialize Spark session
spark = SparkSession.builder.appName(“SimpleSparkTestNoObjectStorage”).getOrCreate()

print(” Starting Spark test job without Object Storage dependencies…”)

# === Generate Sample Data In-Memory ===
data = [
(“North”, 1200),
(“South”, 800),
(“East”, 0),
(“West”, 1500),
(“Central”, 950)
]
columns = [“region”, “amount”]

# Create DataFrame
df = spark.createDataFrame(data, columns)

# Simple transformation: filter positive sales and calculate scaled value
df_transformed = (
df.filter(col(“amount”) > 0)
.withColumn(“scaled_amount”, col(“amount”) * 1.1)
)

# Show the results in Data Flow logs
print(” Transformed Data:”)
df_transformed.show()

# Write output to a temporary local path (Data Flow ephemeral storage)
output_path = “/tmp/output_spark_test/”
df_transformed.write.mode(“overwrite”).parquet(output_path)

print(f” Job completed successfully! Data written to {output_path}”)

# Stop Spark session
spark.stop()

After configuring the infrastructure and attempting to deploy a Big Data Service cluster, I encountered an OCPU quota limitation that prevented me from continuing with the standard Spark-based workflow. This led me to explore an alternative, accessible approach.

Step 3: Creating and Using the Autonomous AI Database

The Autonomous AI Database (AADB) replaces traditional Autonomous Data Warehouse (ADW) and Autonomous Transaction Processing (ATP) services with a unified, AI-driven platform. It retains the same autonomous features—self-tuning, auto-scaling, and high performance—while introducing AI/ML integration, vector search, and large language model (LLM) support.

For this project, I created an AADB instance with the Lakehouse workload type, which is optimized for analytics and OAC integration. Once the database was available, I accessed Database Actions → Machine Learning to launch Oracle Machine Learning (OML) notebooks and execute the Python-based transformation logic directly inside the database.

Step 3. 2: Simulating the Big Data Workflow Without Administrative Privileges

This step replicates the Load phase of a typical ETL process — the same stage where Oracle Data Flow would normally write to ADW

Initially, my plan was to continue with Oracle Data Flow to demonstrate Spark-based transformations within Oracle Cloud. However, since my current tenancy does not yet include the necessary IAM privileges to create and execute Data Flow applications, I decided to adapt the implementation temporarily. Once IAM permissions are granted, I’ll publish a second post demonstrating how to replace this simulated step with Oracle Data Flow, covering cluster creation, Spark application deployment, and automated data refresh into ADW and OAC.

3.1 Simulate the Transformations (Local / Notebook)

Use Python + Pandas to mirror the Spark logic:

import pandas as pd

# Simulated raw data (as if ingested from source or data lake)
data = [
    ("North", 1200),
    ("South", 800),
    ("East", 0),
    ("West", 1500),
    ("Central", 950)
]
df = pd.DataFrame(data, columns=["region", "amount"])

# Transform (Spark-like): filter invalid rows and scale values
df = df[df["amount"] > 0].copy()
df["scaled_amount"] = df["amount"] * 1.1

print(" Transformed data ready for analytics:")
print(df)

# Optional: save a CSV to load into AADB.
df.to_csv("sales_transformed.csv", index=False)

Creating the Autonomous AI Database Environment

To execute the simulated transformation script and store the resulting dataset, I created an Autonomous AI Database in Oracle Cloud Infrastructure (OCI). This new name represents the evolution of the former

Warehouse (ADW) and Autonomous Transaction Processing (ATP) offerings into a unified, AI-driven data platform. The Autonomous AI Database (AADB) extends the same core capabilities of ADW—self-tuning, self-patching, and elastic scaling—while introducing enhanced integration with AI and Machine Learning (ML) services, Vector Search, and Large Language Model (LLM) support.

From the OCI Console, I navigated to Oracle AI Database → Autonomous AI Database, selected Create Autonomous Database, and chose the Lakehouse workload type, which is optimized for analytics, data engineering, and seamless integration with Oracle Analytics Cloud (OAC). I configured a few basic settings such as display name, compartment, and admin credentials, left the default compute and storage parameters with auto-scaling enabled, and selected Secure access from everywhere to simplify initial testing. Once the database reached the Available state, I accessed it through Database Actions → Machine Learning, launching the Oracle Machine Learning (OML) notebooks to run the Python simulation script directly inside the database environment.

The Lakehouse workload type is the modern replacement for “Autonomous Data Warehouse (ADW)”.

Running the Simulation Code in the Autonomous AI Database

Once the Autonomous AI Database (AADB) instance was created and active, the next step was to run the Python-based data transformation code inside its built-in Oracle Machine Learning (OML) environment. OML provides an interactive notebook interface where Python, SQL, and R scripts can be executed directly within the database, eliminating the need for external tools or infrastructure setup. This makes it ideal for testing and simulating Spark-like transformations, especially in environments without administrative privileges.

Step 4: Executing the Code in Oracle Machine Learning (OML

Within the OML environment, I created a notebook named BigData_NoAdmin_Simulation, set the interpreter to Python, and ran the transformation script.

The notebook executed the logic directly inside the database engine, producing a transformed DataFrame containing the original and scaled values. This confirmed that even without Spark clusters or Data Flow, OML can simulate data engineering workflows natively.

To access OML, I opened the Database Actions page from the AADB details screen in the OCI Console and selected Machine Learning from the available tools. This launched the OML workspace, where I logged in using my database credentials (for example, the ADMIN user). Inside the workspace, I created a new Python notebook named BigData_NoAdmin_Simulation and selected the default Conda runtime environment.

In a new notebook cell, I pasted the Python code, which replicates the same logic that would normally run in an Oracle Data Flow Spark job.

After setting the interpreter to Python in the OML notebook and running the simulation script, the output is generated directly within the Autonomous AI Database environment, confirming that the transformation logic works as intended. The notebook displays the DataFrame containing the regions, original amounts, and newly calculated scaled values, demonstrating how the transformation mirrors a Spark-based process but runs natively inside Oracle’s managed environment. This step validates the entire “no-admin” simulation approach — proving that Python-based data manipulation can be performed securely within the database itself, without relying on an external Spark cluster or Data Flow configuration.

The next step is to persist this data inside the database for downstream analytics. Using a few lines of Python within the same notebook, the transformed DataFrame can be pushed into the Autonomous AI Database as a permanent table named SALES_TRANSFORMED. Once saved, this table becomes immediately accessible via Database Actions → SQL or any BI connection, including Oracle Analytics Cloud (OAC). In the upcoming section, I’ll connect OAC to this table to visualize the processed data through interactive dashboards — completing the end-to-end simulation pipeline from data generation to insight delivery.

Saving the Transformed Data into the Autonomous AI Database

Once the simulation code has executed successfully and the output DataFrame has been verified inside the notebook, the next step is to save the results as a persistent table within the Autonomous AI Database (AADB). This ensures that the transformed data is stored securely and is immediately accessible to other Oracle Cloud services — including SQL Developer Web and Oracle Analytics Cloud (OAC).

Step 5: Workaround to Persist Data without Administrative Privileges

When I attempted to persist the DataFrame using the oml.push() and .save() pattern, the privilege limitation prevented OML from creating new objects on behalf of the user.
Instead of altering database permissions, I used a modern, compliant approach:

%python
import oml

# Push the DataFrame into the database as a new table
oml_df = oml.push(df, 'SALES_TRANSFORMED')

# Save the table permanently in the schema
oml_df.save(if_exists='replace')
print(" Table 'SALES_TRANSFORMED' created successfully in the Autonomous AI Database.")

Error:

Fixing the error:

During the data persistence step, I encountered an ORA-06598: insufficient INHERIT PRIVILEGES privilege error when trying to execute the oml.push() command in Oracle Machine Learning (OML). This occurred because the OML internal schema (PYQSYS) needs explicit permission to inherit privileges from the user schema (in this case, ADMIN) before it can create database objects on the user’s behalf. To fix the issue, I connected to Database Actions → SQL and ran the following grants as the ADMIN user:

GRANT INHERIT PRIVILEGES ON USER PYQSYS TO ADMIN;
GRANT INHERIT PRIVILEGES ON USER ADMIN TO PYQSYS;
GRANT CREATE TABLE, CREATE VIEW, CREATE SEQUENCE TO ADMIN;
GRANT UNLIMITED TABLESPACE TO ADMIN;
ALTER USER ADMIN QUOTA UNLIMITED ON DATA;

After applying these permissions and restarting the OML interpreter, the code executed correctly. However, I also discovered that the .save() method used in earlier OML versions was deprecated in the latest OML4Py release. To adapt, I replaced it with the modern approach using oml.create() to persist the DataFrame as a permanent table:

This workaround successfully created the SALES_TRANSFORMED table without requiring cross-schema privilege grants.

You can verify the table’s existence by opening Database Actions → SQL and running.

This command creates the table SALES_TRANSFORMED directly from the in-memory Pandas DataFrame inside the Autonomous AI Database. Once the cell runs successfully, the table becomes visible in Database Actions → SQL, where it can be queried using a simple statement like:

This confirms that the transformation pipeline has been successfully executed end-to-end within the database. The simulation now behaves like a full data engineering flow — transforming data in OML, storing it persistently in the Autonomous AI Database, and preparing it for analysis in Oracle Analytics Cloud (OAC).

Step 6: Connecting to Oracle Analytics Cloud (OAC)

With the transformed data stored in the Autonomous AI Database, I connected Oracle Analytics Cloud (OAC) to the database to visualize the results.

After establishing the connection (using the ADMIN credentials), I created a dataset from the SALES_TRANSFORMED table and built interactive dashboards including:

Bar Charts: comparing regional sales and scaled values
KPI Tiles: highlighting top-performing regions
Pie Charts: showing proportional sales distribution

This visualization step completes the Big Data → AADB → OAC workflow.

Conclusion

This exercise demonstrates that even without full administrative privileges, it’s possible to build and showcase Oracle’s complete data analytics lifecycle — from transformation logic to visualization — using accessible components.

This implementation shows that even in environments without administrative privileges or IAM access, it’s possible to execute, test, and validate a full data engineering and analytics pipeline within Oracle Cloud Infrastructure.

By using Oracle Autonomous AI Database (AADB), Oracle Machine Learning (OML), and Oracle Analytics Cloud (OAC), data engineers can replicate real-world big data scenarios, simulate Spark-based transformations, and visualize results—all securely within Oracle’s managed environment.

Once IAM and system privileges are granted, this same approach can seamlessly extend to Oracle Data Flow or Big Data Service, transforming the simulation into a fully automated, cloud-native data pipeline.

This experiment reinforces Oracle’s commitment to flexibility, allowing innovation and exploration even in restricted environments—balancing security, scalability, and practical implementation.

Stay tuned! The next step is to extend this project using Oracle Data Flow to complete a fully serverless Big Data pipeline.

brunors

*The views expressed here are my own and do not represent those of my employer.*

Hello, I’m Bruno — a dual citizen of Brazil and Sweden. I bring a global perspective shaped by experiences in both South America and Europe, with a strong focus on collaboration and innovation across cultures. I am a Computer Scientist, PhD Candidate in Information and Communication Technologies, focusing on Data Science and Artificial Intelligence, and hold dual Master’s degrees in Data Science and Cybersecurity. With over fifteen years of international experience spanning Brazil, Hungary, and Sweden, I have collaborated with global organizations such as IBM, Playtech, and Oracle, as well as contributed remotely to projects across multiple regions. My professional interests include Databases, Cybersecurity, Cloud Computing, Data Science, Data Engineering, Big Data, Artificial Intelligence, Programming, and Software Engineering, all driven by a deep passion for transforming data into strategic business value.

Related posts

Building a Semantic Search API with MySQL Vector Search, Oracle Cloud, and an NBA Kaggle Dataset

Implementing the Model Context Protocol (MCP) in Oracle Database 26ai

52 Kilometers, Two Races, One Cloud Migration — Stockholm Marathon Meets MySQL HeatWave and Oracle DB 23ai