Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defered data for download button #5053

Open
Vinno97 opened this issue Jul 28, 2022 · 37 comments
Open

Defered data for download button #5053

Vinno97 opened this issue Jul 28, 2022 · 37 comments
Labels
feature:st.download_button status:likely Will probably implement but no timeline yet type:enhancement Requests for feature enhancements or new features

Comments

@Vinno97
Copy link

Vinno97 commented Jul 28, 2022

Problem

The download button currently expects its data to be available when declaring the button. If data needs to be read from disk (or worse: compiled multiple disk sources), this can make the app needlessly slow.
In my app, the data downloading is not a common use case, but the packing of the data for downloading is relatively expensive. Caching helps, but only when the data doesn't change.

Solution

I propose a method to only load and preprocess (archive, pickle, etc) when the download is actually requested.

I propose to also allow a function as a data type that gets called as soon as the download button is pressed. This callback then returns the actual data.

def get_data():
    data = some_heavy_data_loading()
    return data

st.download_button("Download Data", get_data, file_name="my-data.dat")

Possible additions:

Currently a download button accepts str, bytes, TextIO, BinaryIO, or io.RawIOBase. With deferred loading, it would also be possible to accept a file pointer and stream the data to the user. This might bring huge speed and memory benefits when downloading large files.

Technically this streaming would also be possible without deferred loading, but then you're keeping unnecessary files open.


Community voting on feature requests enables the Streamlit team to understand which features are most important to our users.

If you'd like the Streamlit team to prioritize this feature request, please use the 👍 (thumbs up emoji) reaction in response to the initial post.

@Vinno97 Vinno97 added type:enhancement Requests for feature enhancements or new features status:needs-triage Has not been triaged by the Streamlit team labels Jul 28, 2022
@lukasmasuch lukasmasuch added feature:st.download_button and removed status:needs-triage Has not been triaged by the Streamlit team labels Jul 28, 2022
@lukasmasuch
Copy link
Collaborator

@Vinno97 Thanks for the suggestion. This would be indeed a nice addition to the download button, especially when dealing with large files. I will forward this feature request to our product team.

@tomgallagher
Copy link

In the meantime, I'm using this as a way of ensuring that page flow is not interrupted by large file prep

def customDownloadButton(df):
    if st.button('Prepare downloads'):
        #prep data for downloading
        csv = convert_df(df)
        json_lines = convert_json(df)
        parquet = convert_parquet(df)
        tab1, tab2, tab3 = st.tabs(["Convert to CSV", "Convert to JSON", "Convert to Parquet"])
        with tab1:
            st.download_button('Download', csv, file_name='data.csv')
        with tab2:
            st.download_button('Download', json_lines, file_name='data.json')
        with tab3:
            st.download_button('Download', parquet, file_name='data.parquet')

@jrieke
Copy link
Collaborator

jrieke commented Jul 30, 2022

Yes agree! Back when we implemented download button, I know that we also thought about allowing users to pass a function. Not sure if we cut that just to reduce scope or if there were any reasons against doing that. Will revisit!

@xR86
Copy link

xR86 commented Aug 29, 2022

I also had this issue, but it appears that it does approximately what you proposed, @Vinno97 ?
The docs mention that you could have a callback for this.

Not sure if I'm missing some nuance with blocking when downloading large files, but I've already used this for data to be generated on click, regardless if it's data files or octet streams to be saved as files (eg: zip).

Lifted from the docs:

@st.cache
 def convert_df(df):
     # IMPORTANT: Cache the conversion to prevent computation on every rerun
     return df.to_csv().encode('utf-8')

csv = convert_df(my_large_df)

st.download_button(
     label="Download data as CSV",
     data=csv,
     file_name='large_df.csv',
     mime='text/csv',
 )

@jrieke Was this functionality added in the meantime and not linked to this issue ?

@jrieke
Copy link
Collaborator

jrieke commented Sep 23, 2022

Nope we didn't implement this yet. We don't have a timeline yet but I'm 99 % sure we want to do this at some point.

@amirhessam88
Copy link

Any progress on this ? Do we have an ETA when this bug is gonna be fixed?

@wolfgang-koch
Copy link

I would appreciate if this gets resolved. I already tried to address this issue on the forum a couple months ago: https://discuss.streamlit.io/t/create-download-file-upon-clicking-a-button/32613
My idea was to solve this using some JS, but it's messy and causes some slight shifting down of the page content.

In my opinion, st.download_button should only fill memory with the file's content upon acutally clicking the button instead of on every script re-run.

@jzluo
Copy link

jzluo commented Jan 19, 2023

I'd also like to voice appreciation this feature. I finally tracked down my app's occasional hanging to this issue. In the meantime, gating the download button behind a "prepare data for download" button like @tomgallagher's example above is a clumsy but okay workaround.

@HStep20
Copy link

HStep20 commented Jan 28, 2023

This would be a great feature. I know its highly requested, but when working with APIs, the lack of this feature makes it a miserable experience. It has to hit the API each time the page is reloaded to prep the download, meaning lots of requests within a quota are used up. Its even worse if you have multiple tabs on a page, each of which download a different dataset for the user - It means x api calls per page load, per tab, each time the script is rerun.

Ive mitigated it by using a nested button like tom suggested, to 'get' data, then show the download button to download it, but a proper way to combine both into one UX Action would be amazing.

@masonearles
Copy link

+1

@ElenaBossolini
Copy link

Same problem here. In my case I need to generate a excel file from multiple large pandas dataframes (one dataframe per sheet). I write the data as BytesIO.
The experience is that going from a pandas dataframes to a BytesIO buffer takes about 0.003s, but on the streamlit app, the user is left hanging for multiple seconds. Something between 5s and 10s.

@SabraHealthCare
Copy link

def get_data():
    data = some_heavy_data_loading()
    return data

st.download_button("Download Data", get_data, file_name="my-data.dat")

def get_data():
st.write("test")
data = some_heavy_data_loading()
return data

I added 'st.write("test")' in get_data, and found that "test"was printed before download_button. it means the get_data() still runs even download button is un-clicked.

@andrewpimm
Copy link

andrewpimm commented Oct 30, 2023

def get_data():
    data = some_heavy_data_loading()
    return data

st.download_button("Download Data", get_data, file_name="my-data.dat")

def get_data(): st.write("test") data = some_heavy_data_loading() return data

I added 'st.write("test")' in get_data, and found that "test"was printed before download_button. it means the get_data() still runs even download button is un-clicked.

Unless there has been an update that hasn't been announced here, I'm not sure that a function can be called from st.download_button in this way.

@jsulopzs
Copy link

+1 to this feature, it'd be great for developers to create custom calculators that provide business value and a rich UX.

@CharlesFr
Copy link

any updates on this feature?

@ViniciusgCaetano
Copy link

+1

@zbjdonald
Copy link

any updates on this feature?

@LarsHill
Copy link
Contributor

I came across this issue as well. Besides large data payloads being created on every run, it is annoying that there is no way to create the data only after the download button is clicked.
In my case the raw data to be downloaded is created and stored as session state "after" the position of the download button in the code. Now when I click the download button the previously created data state is downloaded but not the current state.

Here is an example:

create_data = st.button("Create data")

if "data" not in st.session_state:
    st.session_state.data = None

st.download_button(
    label="Export",
    data=st.session_state.data,
    file_name=file_name,
)

if create_data:
    # logic to create data here
    st.session_state.data = create_data_logic()

Now, first I click on the "creata data" button and afterwards I click the "download" button but only None is downloaded.
Only on an app rerun the session state available to the download button is updated and the correct data is downloaded.

If the data creation process could happen in a callback after the download button is clicked, there would be no issue...

Currently this workaround does the job for me, but I feel this should be natively possible in strewamlit without js hacks...

@anki-code
Copy link

anki-code commented Feb 8, 2024

Personally I want to say that Streamlit is very unpleasant for new users and I need to google every step and I continuously facing with issues with use cases. And yes, I want to +1 this bug too because when I want to download the data I want to click on the button, wait processing and get the data.

@sfc-gh-pkommini
Copy link

sfc-gh-pkommini commented Feb 27, 2024

Hi Team,

We currently have the same issue and makes st.download_button unusable in production. Is there a workaround till the callback function is added? Also is there an ETA for the data callback being added?

@goyodiaz
Copy link

@sfc-gh-pkommini The only workaround I ever found is using two buttons as posted above.

@BenGravell
Copy link

+1 on this issue.

A super basic use-case is offering users a download of PNG images. This is a typical desire of a user if you want "archival quality" and are willing to eat the storage size - forcing people into JPEG all the time is not nice. PNG being mostly uncompressed means the filesize / data payload is going to be higher. Even moderately large PNG of dims 3072 x 4096 ends up being ~26 MB, which is totally feasible for generating in-memory and offering for one-off downloads. The ask is just to defer the costly serialization operations until the user actually clicks the download button, rather than having to do it every time just to display a download button. The workaround is too fiddly and requires too much ad-hoc state management to really be called a solution IMO.

@iandesj
Copy link

iandesj commented Apr 3, 2024

My team encountered this bug when apps are deployed in replicas to something like Kubernetes.

@jrieke jrieke added status:likely Will probably implement but no timeline yet and removed status:in-progress We're on it! labels Apr 19, 2024
@jayco10125
Copy link

2 buttons to do the job of 1 is not a suitable workaround. I end up have to have a save and export button when really I should just have an export button. Would be much appreciated if this was included, am very surprised it hasn't been already since being requested 2 years ago..

@jrieke
Copy link
Collaborator

jrieke commented May 14, 2024

Update

Hey all! Sorry for not getting back to this issue for a while. We built a prototype of this last year. However, the implementation was a bit hacky and would have required a bigger effort to get right, so we decided not to pursue it further for the moment.

Given that we recently released partial reruns via st.fragment, there's a good chance we want to rethink the implementation so the function that creates the data can run without rerunning the rest of the script. Other than eng time, there's not really a blocker for this project. Right now, though, we're still working on higher-priority projects. I'll update here once we start working on this again!

@kocielgr
Copy link

+1

1 similar comment
@gauthamkumaran
Copy link

+1

@QuestMi
Copy link

QuestMi commented May 31, 2024

Hi, I created a label to avoid reloading.

effect

image
    @staticmethod
    def export_to_xlsx(tab_name, data_df):
        io_bytes = io.BytesIO()
        writer = pd.ExcelWriter(io_bytes, engine='xlsxwriter')
        data_df.to_excel(writer, index=False, sheet_name=tab_name)
        worksheet = writer.sheets[tab_name]
        worksheet.set_column("A:AZ", 15)
        writer.book.close()
        b64 = base64.b64encode(io_bytes.getvalue()).decode()
        href = f'<a href="data:application/vnd.ms-excel;base64,' \
               f'{b64}" download="{tab_name}.xlsx" ' \
               f'class="download-btn" style="color: white"> ' \
               f'下载数据: {tab_name}</a>'
        st.markdown(f'<style>{DTN_CSS}</style>', unsafe_allow_html=True)
        st.markdown(href, unsafe_allow_html=True)
DTN_CSS = """.download-btn {
                   display: inline-flex;
                   align-items: center;
                   justify-content: center;
                   height: 25px;
                   padding: 0 4px;
                   font-size: 12px;
                   font-weight: bold;
                   line-height: 1.5;
                   text-align: center;
                   text-decoration: none;
                   white-space: nowrap;
                   vertical-align: middle;
                   cursor: pointer;
                   background-color: #FF4B4B;
                   border-radius: 6px;
                   transition: all .1s ease;}
           """

@FJakovljevic
Copy link

Here is one way how to have button that will execute download function only when clicked.
The added state is to avoid flicker since when button is clicked it will start a rerun of page from there and create empty div at top of the page (usually not what you want). This way it will create it as last thing on page and wont affect other elements, and there wont be flicker.

Using streamlit_javascript library but can be done also without it

import base64
import time

import streamlit as st
from streamlit_javascript import st_javascript


def download_text(text, filename):
    # long time process
    time.sleep(3)
    b64 = base64.b64encode(text.encode()).decode()
    js_function = f"""(function() {{
        var link = document.createElement('a');
        link.href = 'data:text/plain;base64,{b64}';
        link.download = '{filename}';
        link.click();
    }})();"""
    st_javascript(js_function)


# adding of state is needed if you want to avoide empty div with iframe at the top of your page (flickers the page)
def trigger_download():
    st.session_state["trigger_download"] = True

st.write("Execute download function only on click!")
st.button("Download Text", on_click=trigger_download)
if st.session_state.get("trigger_download", False):
    st.session_state["trigger_download"] = False
    download_text("This is the text content to download.", "example.txt")

@Nurgak
Copy link

Nurgak commented Aug 29, 2024

It seems this is a recurrent feature request. I too am looking for deferred pre-processing before the download is initiated.

The main issue with the current approach is when you need to remotely download (or otherwise on-the-fly pre-process) large files. As of now these need to be loaded into memory, assuming caching is not a solution to get up-to-date data, which would make page loading slow and might not even be needed by the user. In my case that would be database backup exports: many large files, stored remotely, the user might or might not be interested in getting one of them...

@jrieke You mentioned the use of st.fragment, how would that work in this use-case? Do you have an example? Or does this need further development to work with st.download_button/st.button?

Ideally, I'd like st.download_button to accept a function for the data field. That function would need to return a generator or some stream, so large files could be downloaded/pre-processed on-the-fly and then immediately served as chunks to the user, preventing excessive memory usage. I have not thought this through though... this approach might have some issues for very large files or long processing time per chunk...

@laurafiorini
Copy link

+1

6 similar comments
@anayjain
Copy link

+1

@emanuelbarbera
Copy link

+1

@gsportelli
Copy link

+1

@DavideBFerri
Copy link

+1

@turin1989
Copy link

+1

@Jokip7
Copy link

Jokip7 commented Nov 21, 2024

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:st.download_button status:likely Will probably implement but no timeline yet type:enhancement Requests for feature enhancements or new features
Projects
None yet
Development

No branches or pull requests