doc: website and version

- update all relevant website content for curve calibrator and robynpy - bump up version - adapt curve_type to saturation_reach_hill to account for future options - update maintainers
facebookexperimental · Dec 19, 2024 · 8eb5171 · 8eb5171
1 parent bcd0a13
commit 8eb5171
Show file tree

Hide file tree

Showing 7 changed files with 101 additions and 14 deletions.
diff --git a/R/DESCRIPTION b/R/DESCRIPTION
@@ -1,14 +1,14 @@
 Package: Robyn
 Type: Package
 Title: Semi-Automated Marketing Mix Modeling (MMM) from Meta Marketing Science 
-Version: 3.11.1.9004
+Version: 3.12.0.9000
 Authors@R: c(
-    person("Gufeng", "Zhou", , "[email protected]", c("aut")),
-    person("Bernardo", "Lares", , "[email protected]", c("cre","aut")),
-    person("Leonel", "Sentana", , "[email protected]", c("aut")),
+    person("Gufeng", "Zhou", , "[email protected]", c("cre", "aut")),
     person("Igor", "Skokan", , "[email protected]", c("aut")),
+    person("Bernardo", "Lares", , "[email protected]", c("aut")),
+    person("Leonel", "Sentana", , "[email protected]", c("aut")),
     person("Meta Platforms, Inc.", role = c("cph", "fnd")))
-Maintainer: Bernardo Lares <laresbernardo@gmail.com>
+Maintainer: Gufeng Zhou <gufeng@meta.com>
 Description: Semi-Automated Marketing Mix Modeling (MMM) aiming to reduce human bias by means of ridge regression and evolutionary algorithms, enables actionable decision making providing a budget allocation and diminishing returns curves and allows ground-truth calibration to account for causation.
 Depends:
     R (>= 4.0.0)

diff --git a/R/R/calibration.R b/R/R/calibration.R
@@ -13,7 +13,7 @@
 #' @inheritParams robyn_run
 #' @param df_curve data.frame. Requires two columns named spend and response.
 #' Recommended sources of truth are Halo R&F or Meta conversion lift.
-#' @param curve_type Character. Currently only allows "saturation_reach"
+#' @param curve_type Character. Currently only allows "saturation_reach_hill"
 #' and only supports Hill function.
 #' @param force_shape Character. Allows c("c", "s") with default NULL that's no
 #' shape forcing. It's recommended for offline media to have "c" shape, while
@@ -45,7 +45,7 @@
 #' # Using reach saturation from Halo as proxy
 #' curve_out <- robyn_calibrate(
 #'   df_curve = df_curve_reach_freq,
-#'   curve_type = "saturation_reach"
+#'   curve_type = "saturation_reach_hill"
 #' )
 #' # For the simulated reach and frequency dataset, it's recommended to use
 #' # "reach 1+" for gamma lower bound and "reach 10+" for gamma upper bound
@@ -77,7 +77,7 @@ robyn_calibrate <- function(
   # hp_bounds format
   # hp_interval
 
-  if (curve_type == "saturation_reach") {
+  if (curve_type == "saturation_reach_hill") {
     curve_collect <- list()
     for (i in unique(df_curve$freq_bucket)) {
       message(">>> Fitting ", i)

diff --git a/R/man/robyn_calibrate.Rd b/R/man/robyn_calibrate.Rd
diff --git a/website/docs/features.mdx b/website/docs/features.mdx
@@ -333,7 +333,7 @@ As depicted in plot 4 in [session model onepager](#model-onepager) below, the k-
 
 ---
 
-## Calibration with causal experiments
+## Calibration of average effect size with causal experiments
 
 Randomised controlled trial (RCT) is an established academic gold standard to infer causality in science. By applying results from RCT in ads measurement that are considered ground truth, you can introduce causality into your marketing mix models. Robyn implements the MMM calibration as an objective function in the multi-objective optimization by parameterizing the difference between causal results and predicted media contribution.
 
@@ -363,6 +363,55 @@ There're two major types of experiements in ads measurement, as pointed out by t
 
 Robyn accepts a dataframe as calibration input in the `robyn_inputs()` function. The function usage can be found in the [demo](https://github.com/facebookexperimental/Robyn/blob/main/demo/demo.R#L262).
 
+---
+
+## Holistic calibration
+
+### Rethinking calibration
+
+The triangulation of MMM, experiments and attribution is the centerpiece of [modern measurement](https://www.facebook.com/business/news/advanced-measurement-strategy). While there's no universally accepted definition of calibration, it often refers to the adjustment of estimated impact of media between different measurement solutions. In MMM, calibration usually refers to adjusting the average effect size of a certain channel by causal experiments, as explained in details above. However, the average effect size, or the beta coefficient in a regression model, is not the only estimate in an MMM system. The bare minimum of a set of estimates in MMM includes the **average effect size, adstock and saturation**. And just as the effect size, adstock and saturation are uncertain parametric quantities that can and should be calibrated by ground truth whenever possible. We believe that **holistic calibration** is the next step of triangulation and integrated marketing measurement system.
+
+<img alt="Reach & frequency curve calibration" src={useBaseUrl('img/curve_calibrator.png')} />
+
+### The curve calibrator (beta)
+
+Robyn is releasing a new feature **"the curve calibrator"** `robyn_calibrate()` as a step towards holistic calibration. The first use case is to calibrate the response saturation curve using cumulative reach and frequency data as input. This type of data is usually available as siloed media reports for most offline and online channels. The latest choice of reach and frequency data is **[Project Halo](https://wfanet.org/leadership/cross-media-measurement)**, an industry-wide collaboration to improve cross-channel reach deduplication. The above graphic is derived and simulated based on a real Halo dataset with cumulative spend and cumulative reach by frequency buckets. For example, "reach 3+" means reaching 3 impressions on average per person. There's certainly a gap between saturation of reach and business outcome (purchase, sales etc.). However, they're also interconnected along the same conversion funnel (upper funnel -> lower funnel), while reach & frequency saturaion curve is often more available. Therefore, we're exploring the potential of using reach & frequency to guide response saturation estimation.
+
+According to [a recent paper](https://arxiv.org/abs/2408.07678) from the Wharton School and the London Business School by Dew, Padilla and Shchetkina, an common MMM cannot reliably identify saturation parameters, quote _"as practitioners attempt to capture increasingly complex effects in MMMs, like nonlinearities and dynamics, our results suggest caution is warranted: the simple data used for building such models often cannot uniquely identify such complexity."_ In other words, saturation should be caibrated by ground truth whenever possible.
+
+**Our approach for saturation calibration by reach & frequency**: Assuming an extreme situation where every user sees the first impression and purchases immediately, In such a case, response saturation curve equals the reach 1+ saturation curve. [Hill function](https://facebookexperimental.github.io/Robyn/docs/features#saturation) is a popular choice for saturation transformation and implemented in Robyn, where gamma controls the inflexion point. A lower gamma means earlier and faster saturation at a lower spend level. We believe that the cumulative reach 1+ curve represents the earliest inflexion, thus it serves as a reasonable lower boundary for gamma for a selected channel. As frequency increases, the inflexion point delays and approaches the hidden true response curve. In the dummy dataset, we've simulated reach 10+ to represent the upper bound for gamma. The "best converting frequency" varies strongly across verticals. We believe that reach & frequency it's one step closer to identifying the true saturation relationship. Use domain expertise to further narrow down or widen the bounds. For alpha, we recommend to keeping the value flexible as in default.
+
+The below graphic is an exemplary visualisation of a curve fitting process, where Nevergrad is used to estimate alpha, gamma as well as the beta. Note that the distribution of alpha and gamma are often multimodal and non-normal, because they rather reflect the hyperparameter optimization path of Nevergrad than their underlying distribution.
+
+<img alt="Reach & frequency curve fitting" src={useBaseUrl('img/curve_calibrator_onepager.png')} />
+
+To try out `robyn_calibrate()`, please see [this tutorial](https://github.com/facebookexperimental/Robyn/blob/main/demo/demo.R#L200) in the demo.
+
+```
+library(Robyn)
+data("df_curve_reach_freq")
+
+# Using reach saturation as proxy
+curve_out <- robyn_calibrate(
+  df_curve = df_curve_reach_freq,
+  curve_type = "saturation_reach_hill"
+)
+# For the simulated reach and frequency dataset, it's recommended to use
+# "reach 1+" for gamma lower bound and "reach 10+" for gamma upper bound
+facebook_I_gammas <- c(
+  curve_out[["curve_collect"]][["reach 1+"]][["hill"]][["gamma_best"]],
+  curve_out[["curve_collect"]][["reach 10+"]][["hill"]][["gamma_best"]])
+print(facebook_I_gammas)
+
+```
+
+### Customizable for 3rd-party MMM
+While the curve calibrator is released within the Robyn package, it can be used standalone without having built a model in Robyn. The current beta version is piloting the two-parametric Hill function for saturaion. Any MMM solution, not only Robyn, that employs the two-parametric Hill function can be callibrated by the curve calibrator.
+
+In the future, we're planning to partner with our community, advertisers, agencies and measurement vendors to further explore this area and also to expand the curve calibrator to cover other popular nonlinear functions for saturation (e.g. exponential, arctan or power function) as well as adstock (geometric or weibull function).
+
+
+
 ---
 ## Model onepager
 

diff --git a/website/docs/robyn-api.mdx b/website/docs/robyn-api.mdx
@@ -1,15 +1,53 @@
 ---
 id: robyn-api
-title: Robyn API for Python
+title: Robyn Python (Beta)
 ---
 
 import useBaseUrl from '@docusaurus/useBaseUrl';
 
-Enabling Robyn for Python has been a long-standing ask from the community. Robyn has started as an experimental R package. While we understand the needs of the users, it's difficult to maintain a natively translated Python package during active development on the R-side. 
+## Alternative 1: Quick start for RobynPy (Beta)
+
+The Python version of Robyn is rewritten from Robyn's R package version 3.11.1 to Python using object oriented programming principles and modular architecture for a robust solution. It was developed by utilizing various LLMs and AI workflows like Llama. As is common with any AI-based solutions, there may be potential challenges in translating code from one language to another. In this case, we anticipate that there could be some issues in the translation from R to Python. However, we believe in the power of community collaboration and open-source contribution. Therefore, we are opening this project to the community to participate and contribute. Together, we can address and resolve any issues that may arise, enhancing the functionality and efficiency of the Python version of Robyn. We look forward to your contributions and to the continuous improvement of this project.
+
+
+### 1. Installing the package
+
+Install the latest Robyn Python package version from pypi
+```
+pip3 install robynpy
+```
+
+Install from Github using requirements.txt
+```
+pip3 install -r requirements.txt
+```
+### 2. Getting started
+
+The directory `python/src/robyn/tutorials` contains tutorials for most common scenarios. Tutorials use simulated dataset provided in the package.
+
+### 3. Running end-to-end
+
+There are two ways of running Python Robyn.
+
+**Option 1:**
+
+tutorial1.ipynb is the main notebook that runs the end-to-end flow. It is designed for majority of the users who would prefer a one click solution that runs the robyn flow end-to-end with minimal knowledge of the underlying logic. It should run without any changes required if you wish to use the simulated dataset for testing purposes.
+
+This notebook uses APIs available in python/src/robyn/robyn.py to set the configs, run feature engineering, run model training, evaluate models with clustering, generate one pagers and perform budget allocation.
+
+Change any of the configs directly in the notebook and avoid changes to robyn.py for what can be configurable.
+
+**Option 2:**
+
+tutorial1_src.ipynb runs the end-to-end flow of robyn python but with a lot more flexibility. It is designed for users who would like to have more control over which modules are and aren't run (ie. skipping clustering/one pager plots/budget allocation etc.). It should run without any changes required if you wish to use the simulated dataset for testing purposes.
+
+This notebook doesn't use APIs available in python/src/robyn/robyn.py but instead, calls the modules directly with the appropriate parameters. In this way, it is more flexible but still expects the users to understand the underlying logic that may change when using various parameter values.
+
+## Alternative 2: The Python wrapper
 
 The idea of a plumber API for Python is originally proposed by [Alex Rowley](https://www.facebook.com/groups/robynmmm/posts/1493036524797809/) from the Robyn community in August 2023. The Robyn team has assessed the proposal that is not only a great work-around for Python users to start with Robyn, but it actually allows API calls from any languages. We're very grateful for the collective wisdom of the open source community.
 
 #### Robyn API for Python beta release
-The first version of the API is released on Nov.22nd 2023 on the [Meta OST summit](https://metaostsummit23.splashthat.com/?fbclid=IwAR1SRBTZGw0GIoaF0XJq_eCWFZsZbyK0KP7P4RLKoee1IVbs8H56so3giwg). This [Jupyter notebook](https://github.com/facebookexperimental/Robyn/blob/main/robyn_api/robyn_python_notebook.ipynb) shows how to call the API from Python. In the beta version, the user needs to have the Robyn R package successfully installed first. We'll work on the migitation of installation friction in the future.  
+The first version of the API is released on Nov.22nd 2023 on the [Meta OST summit](https://metaostsummit23.splashthat.com/?fbclid=IwAR1SRBTZGw0GIoaF0XJq_eCWFZsZbyK0KP7P4RLKoee1IVbs8H56so3giwg). This [Jupyter notebook](https://github.com/facebookexperimental/Robyn/blob/main/robyn_api/robyn_python_notebook.ipynb) shows how to call the API from Python. In the beta version, the user needs to have the Robyn R package successfully installed first. We'll work on the migitation of installation friction in the future.
 
 <img alt="Robyn API for Python Architecture" src={useBaseUrl('/img/robyn_api_architecture.png')} />
diff --git a/website/static/img/curve_calibrator.png b/website/static/img/curve_calibrator.png
diff --git a/website/static/img/curve_calibrator_onepager.png b/website/static/img/curve_calibrator_onepager.png