Introduction
Does your CPQ implementation have a lot of data tables? Do you know what data they contain? Answer yes to both those questions and simplify how you accomplish these tasks by adopting the sample code in this post.
First, make sure you follow the steps outlined in the post, Automate CPQ Data Table Exports, which details how to export one or more CPQ data tables by calling the Export Data Tables API.
Once you’ve downloaded your some tables you can start gaining some insight into your company’s data by following this post, Analyze CPQ Data Tables with Pandas DataFrames. Pandas is one of the most important Python packages available for
ask file zip archive that includes a separate CSV file for each data table you specified in the Export Data Tables request and then loading them into Padas dataframes.
There’s a few capabilities that come together to make this possible. First, requests stream.
- Stream the response from the Export Data Tables request using Python requests library body content workflow.
- Leverage Python ZipFile ability to read and extract “file-like objects” – in other words – streams of data.
- Use Pandas read_csv ability to read “file-like objects” into a dataframe.
Streaming the Response
The Python requests library usually downloads the response immediately but you can override this behaviour and defer downloading the response body using with the stream parameter. Here’s example code that illustrates how to use stream.
import io
import requests
from requests.auth import HTTPBasicAuth
CHUNK_SIZE = 8192
def download_task_file(*args, **kwargs):
# Create session object
session = requests.Session()
# Set authorization
session.auth = HTTPBasicAuth(kwargs.get('name'), kwargs.get('pwd'))
with session.get(
url = kwargs.get('url'),
params = kwargs.get('params'),
headers = kwargs.get('headers'),
stream=True) as response:
buffer = io.BytesIO()
for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
if chunk:
buffer.write(chunk)
return buffer
Opening Zip Archive and Extracting Files
Python ZipFile allows you to open a zip archive using a Bytes.IO stream. Once open, you can extract select files to a stream. See the below example for how to accomplish this.
import zipfile
import io
def extract_to_stream(buffer, file_name):
# Open archive using buffer
with ZipFile(buffer) as z:
with z.open(file_name) as f:
stream = io.BytesIO(f.read())
return stream
# Example file name
file_name = 'example_data_table.csv'
# Download buffer using earlier example
buffer = download_file(url=file_name)
# Extract file from a zip archive to a BytesIO stream
stream = extract_to_stream(buffer , file_name)
Reading Data into Dataframe
The post Analyze CPQ Data Tables with Pandas DataFrames demonstrates how to extract metadata from the downloaded data table extract and how to specify the columns when reading the data. You can expand on that behavior by iterating though files by chunk. Here’s an example for how to do so using the stream you’ve opened in the earlier example.
import pandas as pd
CHUNK_SIZE = 8192
# Read stream into dataframe by chunk size, include column names and data types, start import at row 5
# See https://www.ateam-oracle.com/post/analyze-cpq-data-tables-with-pandas-dataframes for col_metadata definition
with stream
# Skip first column. For Data Tables with Natural Keys enabled, a column called '_update_action' is first column created
with pd.read_csv(stream,
sep=',',
names=list(col_metadata.keys()),
dtype=col_metadata,
skiprows=5,
usecols=range(1,len(col_metadata)),
chunksize=CHUNK_SIZE) as reader:
chunks = []
# Read chunk into array
for chunk in reader:
chunks.append(chunk)
# Concat array into one dataframe
df = pd.concat(chunks)
df.info()
Summary
That’s all you need to simplify your data table extracts into Pandas and stop needlessly writing to disk. Happy hunting!
