Examples

This page provides some examples to help familiarise the user with the use cases for Streamly.

Basic

Here is the simple, contrived example from the GitHub README:

import streamly


my_stream = io.BytesIO(
b"""Header
Metadata
Unwanted
=
Garabage

Report Fields:
col1,col2,col3,col4,col5
data,that,we,actually,want
and,potentially,loads,of,it,
foo,bar,baz,lorem,ipsum
foo,bar,baz,lorem,ipsum
foo,bar,baz,lorem,ipsum
...,...,...,...,...
Grand Total:,0,0,1000,0
More
Footer
Garbage
"""
)

wrapped_stream = streamly.Streamly(my_stream,
    header_row_identifier=b"Report Fields:\n",
    footer_identifier=b"Grand")

data = wrapped_stream.read(50)
while data:
    print(data)
    data = wrapped_stream.read(50)

HTTP Response

Please note that this example requires requests.

As mentioned in Getting Started, a common use case where Streamly can help is when dealing with an “unclean” HTTP response, i.e. a report returned by a digital marketing API. We’ll use some test data from the GitHub repository to demonstrate the use case here. Ensure you configure the output_file_path variable below:

import gzip

import requests
import streamly


# change this to the location you want to write to
output_file_path = "output.txt"

url = ("https://raw.githubusercontent.com/adamcunnington/"
       "Streamly/master/tests/data/test_data_1.txt")
raw_stream = requests.get(url, stream=True).raw
# raw.githubusercontent.com returns gzip encoded content
decompressor = gzip.GzipFile(fileobj=raw_stream)
wrapped_stream = streamly.Streamly(decompressor,
    header_row_identifier=b"Fields:\n", footer_identifier=b"Grand")

data = wrapped_stream.read()
if data:
    with open(output_file_path, "wb") as fp:
        while data:
            fp.write(data)
            data = wrapped_stream.read()

Navigate to output_file_path to see the output data.

Merging Files

Another example would be use Streamly to merge files. For the purposes of demonstration, start by manually downloading the following files to the same directory of your choice:

Then configure the files_dir_path variable below:

import os

import streamly


files_dir_path = "/home/<username>/Downloads/"

part_1 = os.path.join(files_dir_path, "test_data_1.txt")
part_2 = os.path.join(files_dir_path, "test_data_1 - page 2.txt")

kwargs = {"encoding": "utf8", "newline": ""}

with open(part_1, **kwargs) as fp1:
    with open(part_2, **kwargs) as fp2:
        wrapped_streams = streamly.Streamly(fp1, fp2, binary=False,
            header_row_identifier="Fields:\n", footer_identifier="Grand")
        # Large read size as we're just reading from disk
        data = wrapped_streams.read(100000)
        if data:
            with open(os.path.join(files_dir_path, "output.txt"),
                      "f", **kwargs) as fp_out:
                while data:
                    fp_out.write(data)
                    data = wrapped_streams.read(100000)

Navigate to the output.txt file @ files_dir_path to see the output data.