Stream CPQ Data Table Exports to Pandas Dataframes

Introduction

Does your CPQ implementation have a lot of data tables? Do you know what data they contain? Answer yes to both those questions and simplify how you accomplish these tasks by adopting the sample code in this post.

First, make sure you follow the steps outlined in the post, Automate CPQ Data Table Exports, which details how to export one or more CPQ data tables by calling the Export Data Tables API.

Once you’ve downloaded your some tables you can start gaining some insight into your company’s data by following this post, Analyze CPQ Data Tables with Pandas DataFrames. Pandas is one of the most important Python packages available for data analysis and that post shows how you get started on some simple analysis tasks.

Simplify Data Loading

There’s one step of the process that needs a little more detail added so you can avoid writing to disk – the middle step of downloading the task file zip archive that includes a separate CSV file for each data table you specified in the Export Data Tables request and then loading them into Padas dataframes.

There’s a few capabilities that come together to make this possible. First, requests stream.

Stream the response from the Export Data Tables request using Python requests library body content workflow.
Leverage Python ZipFile ability to read and extract “file-like objects” – in other words – streams of data.
Use Pandas read_csv ability to read “file-like objects” into a dataframe.

Streaming the Response

The Python requests library usually downloads the response immediately but you can override this behaviour and defer downloading the response body using with the stream parameter. Here’s example code that illustrates how to use stream.

import io
import requests
from requests.auth import HTTPBasicAuth

CHUNK_SIZE = 8192

def download_task_file(*args, **kwargs):

  # Create session object
  session = requests.Session()

  # Set authorization
  session.auth = HTTPBasicAuth(kwargs.get('name'), kwargs.get('pwd'))

  with session.get(
    url = kwargs.get('url'),
    params = kwargs.get('params'),
    headers = kwargs.get('headers'),
      stream=True) as response:

    buffer = io.BytesIO()
    for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
      if chunk:
        buffer.write(chunk)

    return buffer

Opening Zip Archive and Extracting Files

Python ZipFile allows you to open a zip archive using a Bytes.IO stream. Once open, you can extract select files to a stream. See the below example for how to accomplish this.

import zipfile
import io

def extract_to_stream(buffer, file_name):
  # Open archive using buffer
  with ZipFile(buffer) as z:
    with z.open(file_name) as f:
      stream = io.BytesIO(f.read())

  return stream

# Example file name
file_name = 'example_data_table.csv'

# Download buffer using earlier example
buffer = download_file(url=file_name)

# Extract file from a zip archive to a BytesIO stream
stream = extract_to_stream(buffer , file_name)

Reading Data into Dataframe

The post Analyze CPQ Data Tables with Pandas DataFrames demonstrates how to extract metadata from the downloaded data table extract and how to specify the columns when reading the data. You can expand on that behavior by iterating though files by chunk. Here’s an example for how to do so using the stream you’ve opened in the earlier example.

import pandas as pd

CHUNK_SIZE = 8192

# Read stream into dataframe by chunk size, include column names and data types, start import at row 5
# See https://www.ateam-oracle.com/post/analyze-cpq-data-tables-with-pandas-dataframes for col_metadata definition
with stream
  # Skip first column. For Data Tables with Natural Keys enabled, a column called '_update_action' is first column created
  with pd.read_csv(stream,
    sep=',',
    names=list(col_metadata.keys()),
    dtype=col_metadata,
    skiprows=5,
    usecols=range(1,len(col_metadata)),
    chunksize=CHUNK_SIZE) as reader:

    chunks = []
    # Read chunk into array
    for chunk in reader:
      chunks.append(chunk)

    # Concat array into one dataframe
    df = pd.concat(chunks)

df.info()

Summary

That’s all you need to simplify your data table extracts into Pandas and stop needlessly writing to disk. Happy hunting!

Stream CPQ Data Table Exports to Pandas Dataframes

Introduction

Simplify Data Loading

Streaming the Response

Opening Zip Archive and Extracting Files

Reading Data into Dataframe

Summary

Shea Nolan

OCI-GCP interconnect using Equinix's FCR

Oracle Database at Azure DNS options

Stream CPQ Data Table Exports to Pandas Dataframes

Introduction

Simplify Data Loading

Streaming the Response

Opening Zip Archive and Extracting Files

Reading Data into Dataframe

Summary

Authors

Shea Nolan

OCI-GCP interconnect using Equinix's FCR

Oracle Database at Azure DNS options