Examples¶

This page provides some examples to help familiarise the user with the use cases for Streamly.

Basic¶

Here is the simple, contrived example from the GitHub README:

import streamly


my_stream = io.BytesIO(
b"""Header
Metadata
Unwanted
=
Garabage

Report Fields:
col1,col2,col3,col4,col5
data,that,we,actually,want
and,potentially,loads,of,it,
foo,bar,baz,lorem,ipsum
foo,bar,baz,lorem,ipsum
foo,bar,baz,lorem,ipsum
...,...,...,...,...
Grand Total:,0,0,1000,0
More
Footer
Garbage
"""
)

wrapped_stream = streamly.Streamly(my_stream,
    header_row_identifier=b"Report Fields:\n",
    footer_identifier=b"Grand")

data = wrapped_stream.read(50)
while data:
    print(data)
    data = wrapped_stream.read(50)

HTTP Response¶

Please note that this example requires requests.

As mentioned in Getting Started, a common use case where Streamly can help is when dealing with an “unclean” HTTP response, i.e. a report returned by a digital marketing API. We’ll use some test data from the GitHub repository to demonstrate the use case here. Ensure you configure the output_file_path variable below:

import gzip

import requests
import streamly


# change this to the location you want to write to
output_file_path = "output.txt"

url = ("https://raw.githubusercontent.com/adamcunnington/"
       "Streamly/master/tests/data/test_data_1.txt")
raw_stream = requests.get(url, stream=True).raw
# raw.githubusercontent.com returns gzip encoded content
decompressor = gzip.GzipFile(fileobj=raw_stream)
wrapped_stream = streamly.Streamly(decompressor,
    header_row_identifier=b"Fields:\n", footer_identifier=b"Grand")

data = wrapped_stream.read()
if data:
    with open(output_file_path, "wb") as fp:
        while data:
            fp.write(data)
            data = wrapped_stream.read()

Navigate to output_file_path to see the output data.

Merging Files¶

Another example would be use Streamly to merge files. For the purposes of demonstration, start by manually downloading the following files to the same directory of your choice:

test_data_1

test_data_1 - page 2

Then configure the files_dir_path variable below:

import os

import streamly


files_dir_path = "/home/<username>/Downloads/"

part_1 = os.path.join(files_dir_path, "test_data_1.txt")
part_2 = os.path.join(files_dir_path, "test_data_1 - page 2.txt")

kwargs = {"encoding": "utf8", "newline": ""}

with open(part_1, **kwargs) as fp1:
    with open(part_2, **kwargs) as fp2:
        wrapped_streams = streamly.Streamly(fp1, fp2, binary=False,
            header_row_identifier="Fields:\n", footer_identifier="Grand")
        # Large read size as we're just reading from disk
        data = wrapped_streams.read(100000)
        if data:
            with open(os.path.join(files_dir_path, "output.txt"),
                      "f", **kwargs) as fp_out:
                while data:
                    fp_out.write(data)
                    data = wrapped_streams.read(100000)

Navigate to the output.txt file @ files_dir_path to see the output data.

Examples¶

Basic¶

HTTP Response¶

Merging Files¶

Streamly

Navigation

Related Topics