Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Week 3a: Questions - Artificial Intelligence Allows us to Measure Innovation as Surprise #7

Open
jamesallenevans opened this issue Jan 7, 2025 · 30 comments

Comments

@jamesallenevans
Copy link
Contributor

Post your (<150 word) question for class about or inspired by the following readings:

@aveeshagandhi
Copy link

Based on the first two readings for the week.

In what ways do various approaches to assessing novelty, like the hypergraph models focusing on surprising content-context pairings (Shi et al.) and the "search space" frameworks examining structural or combinatorial novelty (Foster et al.), influence our perception of innovation in different fields? For instance, Shi et al. contend that significant advancements frequently happen when scientists from one area utilize solutions from another field, resulting in surprising and impactful findings. In contrast, Foster et al. emphasize the context-dependent and field-relevant characteristics of novelty, indicating that evaluations of novelty are significantly influenced by localized views of previous knowledge and creative processes. Do these methods indicate complementary understandings—like the possibility for interdisciplinary "expeditions" to spark unexpected combinations—or do they expose fundamental tensions, especially concerning the acknowledgment and reward of novelty? Additionally, in what ways could these frameworks guide approaches to encourage innovation in situations where boundary-crossing concepts encounter institutional or cultural opposition?

@sijia23333
Copy link

Given that Shi and Evans demonstrate in their Nature Communications paper, Surprising combinations of research contents and contexts are related to impact and emerge with scientific outsiders from distant disciplines, that cross-disciplinary collaborations often lead to higher-impact research, but institutional incentives (such as awards and peer review) tend to favor conservative work within established fields, how can scientific evaluation and reward systems be restructured to better support both rigor in domain expertise and the creative potential of cross-disciplinary contributions? Specifically, what mechanisms could balance the quality controls inherent in specialist evaluations with the need to encourage "surprising" research that bridges disciplinary boundaries, without undermining scientific standards or stifling innovation?

@siyakalra830
Copy link

siyakalra830 commented Jan 18, 2025

The Foster et al. paper emphasizes that novelty is inherently subjective and influenced by the context of prior knowledge and inventive processes. Building on this idea, how can we develop novelty measures that effectively capture the dynamic evolution of "inventive spaces"? The sources describe how complex components, through repeated use and "black-boxing," become fundamental units that reshape the landscape for future innovation. These black-boxed components, initially novel, become integrated into the "adjacent possible," driving the emergence of even more groundbreaking inventions. Can we devise methods to track this dynamic process, identifying emerging black-boxed components and anticipating how they might reshape the frontiers of innovation? Moreover, how can we leverage this understanding to guide the evolution of inventive spaces and foster groundbreaking discoveries?

@malvarezdemalde
Copy link

The articles "Surprising combinations of research contents and contexts are related to impact and emerge with scientific outsiders from distant disciplines" and "Surprise! Measuring Novelty as Expectation Violation" emphasize how groundbreaking innovations often emerge from unexpected combinations of ideas, particularly when researchers step outside their disciplinary comfort zones. These "knowledge expeditions," which involve solving problems by drawing insights from distant fields, are strongly associated with outsized scientific and technological impact. However, the current academic and institutional frameworks often reward traditional, within-discipline work, creating barriers to fostering such unconventional and high-risk collaborations. How can research institutions, funding agencies, and policymakers develop strategies to encourage cross-disciplinary collaboration and support researchers willing to explore uncharted intellectual territories? Could targeted initiatives, such as dedicated funding for interdisciplinary projects, structured platforms for knowledge exchange, or recognition and reward systems for transdisciplinary breakthroughs, help mitigate the risks while unlocking the potential for transformative discoveries?

@joycecz1412
Copy link

The article "Surprising combinations of research contents and contexts are related to impact and emerge with scientific outsiders from distant disciplines" discusses how "successful teams [of surprising breakthroughs] not only have members from multiple disciplines, but also members with diverse backgrounds who stitch interdisciplinary teams together".

Given the identified importance of cross-field interaction in creating revolutionary breakthroughs, how can our systems and scientific institutions further encourage interdisciplinary research and facilitate cross pollination of ideas between various faculties? Are there better incentives to promote more cross-field interactions? Are there certain fields that are less or more prone to interdisciplinary crossover? Maybe we could look at the cross-field citations for different subjects.

@kbarbarossa
Copy link

The article "Surprising combinations of research contents and contexts are related to impact and emerge with scientific outsiders from distant disciplines" emphasizes how breakthroughs often result from researchers crossing disciplinary boundaries to address unfamiliar challenges. With the growing integration of artificial intelligence across industries, what role does AI play in shaping these "knowledge expeditions"? Will its widespread adoption encourage more surprising, high-impact collaborations by connecting distant fields, or could it lead to greater self-reliance within disciplines, reducing the frequency of such boundary-crossing innovations? Additionally, how might AI itself serve as both a tool and a barrier in fostering these transformative interactions?

@JaslinAg
Copy link

The “Surprising combinations of research contents and contexts are related to impact and emerge with scientific outsiders from distant disciplines” found innovation is driven by connections between unusual research areas. This research generates novel insights and is more likely to be a hit paper. However, scientists tend to amplify the work they are familiar with – they cite well-known research to gain credibility and reinforce dense fields. There is also a geographical component to this – citations being more likely when scientists/inventors are within 25 miles of each other.

How did the foundation of large universities – where distinct fields of knowledge, which are dense and self-reinforcing, are concentrated in one area – impact the creation of novel and surprising research? What can we learn from these insights that could be applied to the increasing use of video communication and AI tools?

@vmittal27
Copy link

In Surprising combinations of research contents and contexts are related to impact and emerge with scientific outsiders from distant disciplines, we learn about a hypergraph model to predict breakthrough papers through surprising content and contexts. The content of a paper is operationalized through the use of keywords, which the authors identify as an imperfect method of measuring the content. These keywords don't factor in equations, figures, etc.

My question pertains to how might multimodal LLMs aid new ways of operationalizing content. Since LLMs are capable of understanding large amounts of unstructured data (and many different data formats), they could create more effective measures of a paper's contents. However, I wonder: might this use of LLMs lead to other problems where the model hallucinates details that aren't even there, or because these papers contain novel content an LLM doesn't understand, it ignores the novelty and focuses on the context and preexisting ideas instead?

@jessiezhang39
Copy link

In the paper “Surprising combinations of research contents and contexts”, the authors formalized the concept of “knowledge expedition”, where scientists from one discipline venture to solve problems for an audience from a distant field. Such collective abduction proved key to achieving innovative breakthroughs and outsized impact. Further, an important distinction was made between trans-disciplinary and interdisciplinary research, with the former more likely to achieve pronounced scientific success. This makes me wonder how adequate institutions and infrastructure can be put in place to incentivize rather than alienate seemingly “estranged ideas” from scientific outsiders, especially given some researchers’ tendency to strengthen the inner connections of a subfield rather than exploring outward?

@cskoshi
Copy link

cskoshi commented Jan 20, 2025

In "Surprising combinations of research contents and contexts", there seems to be a tension between the "conservative influence" and the concept of "knowledge expedition". If knowledge expedition is indeed scientists "travelling to a distant other and addressing a problem familiar to the new audience, but with a surprising approach", and if that is indeed a cause for advance, would the conservative influence that awards give provide a negative effect on advance? It seems to me that awards would distort the incentives for scientists to stay within their fields in order to "gain favor and appear relevant to reviewers", rather than conducting such knowledge expeditions. Should we hence rethink how we approach the metrics of award-giving?

Additionally, I found Fig 4a interesting. Why does career and team novelty notably decrease in hit rates after a certain point (around 60). What does the presence of this maxima say about the correlation between team/career novelty and Hit probability? Is there a "optimal" level of novelty after which there becomes negative returns? If so, why?

@mskim127
Copy link

mskim127 commented Jan 20, 2025

“Simulating subjects: The Promise and Peril of AI Stand-ins for Social Agents and Interactions” evaluates the efficacy of using AI Agents as proxies for real life populations. It appears to me however, that by changing the subject of discussion from that of the social agent to one of the economic agent, we are able to mitigate some of the pitfalls of using LLMs to simulate certain demographics be it from atemporality, bias, etc. While questioning Republicans and Democrats on matters of ethics might lead to wildly different answers, it might be reasonable to assume that their reactions to their landlord increasing rents might not vary quite as much. Given the more homogenous nature of the economic agent, how effective will these simulations be in the acquisition of economic knowledge in the form of counterfactuals that has eluded social scientist for so long? I would expect this to be dependent on the degree to which individuals are similar as economic agents. Too similar and this method might provide little predictive upside compared to the more simplified models used in conventional economics while loosing too much in terms of explanatory power and parsimony.

@yangkev03
Copy link

In the reading “Surprising combinations of research contents and contexts are related to impact and emerge with scientific outsiders from distant disciplines”, we learn about how science and technology is associated with surprising breakthroughs under a framework of analyzing the quality of paper in relation to different forms of novelty. One such form of novelty comes from the idea of a knowledge expedition in which scientists are able to move from one intellectual area to another, providing a differentiated approach to a familiar problem.
 
My question for the week revolves around how formal institutions of research can maximize the length of knowledge expeditions given that their length has a causal relationship with major scientific breakthroughs. An example of intentional design for maximizing spontaneous interactions is Steve Jobs's design of Apple Park where buildings were multi-purposed and connected through a main walkway in order to increase the amount of conversation between workers in different business functions. Can institutions of research adopt similar principles to increase knowledge expeditions as well? Additionally, if expeditions are necessary to disrupt the frontier, does interdisciplinary education or collaboration with various fields lead to greater innovation? In the same vein, do countries that specialize in some field or area of research lose out on these forces?

@nsun25
Copy link

nsun25 commented Jan 20, 2025

In ‘"Surprise! Measuring Novelty by Simulating Discovery.” 2020. Jacob Foster, Feng Shi and James Evans’, I learned that novelty is a critical component of scientific, technological, artistic, and economic advancements. Despite its importance, novelty remains inadequately studied, measured, and theorized. The study indicates that novelty measures should be tailored to different domains and highlights the potential of Bayesian approaches for nuanced novelty assessments. Given the framework's emphasis on diverse and domain-specific novelty measures, how might we design adaptive policies for funding and supporting R&D initiatives that align with the distinct novelty assessment criteria of various fields?

@darshank-uc
Copy link

darshank-uc commented Jan 20, 2025

In Surprise! Measuring Novelty as Expectation Violation, the authors introduce multiple frameworks to assess novelty––one of which uses a combinatorial approach to identify variance in the ways features are connected, and another which compares impact to novelty through similarity between patent classes. Most inventions are incremental updates to previous patents (or feature combinations), and therefore their impact factor or novelty factor may not be quantitatively significant. How can the comprehensive novelty model introduced by the authors be adapted to measure synergies between slightly novel patents in generating impact? One novel patent may not revolutionize an industry, but the way in which it interacts with previous patents no longer considered novel in the same field––given an updated knowledge base––could be far more influential. While the authors recognize that novelty and impact influence each other's perceptions, synergistic groups of patents considered as novel, as opposed to individual patents, could subvert this relationship.

@pauline196
Copy link

In the bias section, the paper “Simulating Subjects: The Promise and Peril of AI Stand-ins for Social Agents and Interactions” suggests that using original, web-trained models can help address biases introduced during fine-tuning, such as those that make AI appear “nicer” and “more helpful.” However, a potential challenge with the base model itself is that the training data may be disproportionately influenced by viral content - surprising or newsworthy information that spreads rapidly and garners excessive attention. This can skew the model’s representation of social opinions and behaviors. What strategies could be implemented to mitigate this issue and ensure a more balanced and representative simulation of human interactions and opinions?

@jesseli0
Copy link

There is a somewhat famous quote that goes: "Once a measure becomes a target, it ceases to be a good measure." We might consider this advice when using novelty as a metric for the impact of a researcher. “Surprise! Measuring Novelty by Simulating Discovery.” uses novelty as a measurement of impact or innovativeness of research. This might contribute to the growing replication crisis, as an important part of research is replication, but it is not as incentivized as novel research. Without a backbone of replication papers, a lot of this novel research is in limbo, as we have yet to confirm what the researchers initially saw. Novel results are still valuable in advancing our understanding of the world around us even against existing paradigms, however we are starting to treat it as a target over the truth (i.e. researchers p-hacking to publish non-null results), and all these novel results are not very useful if they are not representative of reality. Is there a way we can balance incentivizing novel research while not disincentivizing replication papers?

@nmkhan100
Copy link

Inspired by "Note on the Transformer architecture underlying modern Large Language Models" (2025) by James Evans and Austin Kozlowski, I’m curious about how this technology impacts collaboration across different fields. We’ve seen how models like GPT can process and generate text in impressive ways, but how might this ability to synthesize information influence areas like the humanities or social sciences, where computational tools aren’t as commonly used? Could these models help bridge gaps between disciplines, leading to new and unexpected discoveries? What are the potential benefits or downsides of relying on AI in areas traditionally driven by human interpretation?

@willowzhu
Copy link

The model from this paper deviates from work that focuses on the institutional structures that influence discovery in science. It generates normal discoveries as combinations of prior knowledge using nodes. It is a generative hypergraph model that extends the mixed-membership stochastic block model into high-dimensions, characterizing complete combinations of contents and contexts.

I have a few follow up questions that build upon the work in this study. In terms of data sets, the model used biomedical sciences, physical sciences, and inventions. Would the researchers have achieved similar results and correlations if they used data sets in other fields like history and literature?

Furthermore, the researchers said in the discussion section that they would like to test their model and measures on papers that did not make it through that process. I wonder if this has been done and what the results might have been when they cast a wider net for their data set. Furthermore, the results of this model demonstrate that scientific subfields may defend their internal approaches and understanding against invasion from outsiders. Is there a tradeoff here that stifles innovation?

@rbeau12
Copy link

rbeau12 commented Jan 21, 2025

"Simulating Subjects" discusses several problems with using LLMs to simulate subjects in social research. Many of these problems are due to the quality of data available for models to be trained on; the internet is biased and has many gaps in cultural knowledge. Further, the two internal methods used to overcome these shortcomings, "finetuning" using a smaller dataset and "activation steering" through example inputs, require high-quality data to be effective. How can social scientists create data that efficiently addresses knowledge gaps? What types of data are most beneficial to model performance? Should scientists change their research strategies to create data that is optimal for model use (rather than human interpretation)?

@michellema02
Copy link

In "Surprising combinations of research contents and contexts are related to impact and emerge with scientific outsiders from distant disciplines," Feng Shi and James Evans present a model showing that two dimensions of novelty—context novelty and content novelty—strongly predict the impact of both patents and scientific research. However, they highlight key differences between science and technology. For instance, due to weaker boundaries in technology, context novelty is less predictive of patent impact, and patents, unlike papers, are equally likely to cite both distant and nearby sources.

Given these differences, I’m curious: what is the relationship between scientific impact and economic impact (through innovation)? Do hit papers receive significant citations from both other papers and patents, or are they skewed towards one? Moreover, since patents inherently exhibit greater context novelty, do papers with higher content novelty more effectively drive innovation?

@pedrochiaramitara
Copy link

The article In Surprise! Measuring Novelty as Expectation Violation they focus on the concept of "black box", which I personally found very interesting. This the process through which complex systems or components become simple and self-contained. They seem to hold it in a positive light, but how might the process of "black boxing" make innovation slower by making it harder to change the internal workings of a tested and proven invention? For instance, when a component becomes "black boxed," its internal items are often hidden from future innovators through patents, potentially discouraging improvements. A large company might want to keep the inner workings of its product from competitors. Could this lead to a stagnation of innovation in certain fields where companies that made a successful product keep it hidden from others, while at the same time not investing in improvements?

@chrislowzhengxi
Copy link

In Surprise! Measuring Novelty by Simulating Discovery, Evans argues that novelty is fundamentally tied to violations of expectations based on prior knowledge and is closely linked to the element of surprise within a specific context. Applying this concept to Large Language Models (LLMs), I have several questions: how can we construct a robust framework to assess the novelty of their outputs? Should novelty evaluation rely primarily on statistical deviations from the patterns and probabilities in training data, or should user expectations and subjective perceptions also play a role in determining what counts as novel? Balancing these two approaches is challenging, particularly in ensuring that outputs remain relevant and coherent while also surprising and engaging users. Also, how might including iterative user feedback into LLM training and evaluation processes better capture the nuanced concept of surprise? What risks or biases might emerge when defining what qualifies as “surprising” or “novel” in different contexts?

@anacpguedes
Copy link

Shi and Evans (2023) highlight the importance of interdisciplinary "knowledge expeditions," where researchers solve problems for audiences outside their field, often producing groundbreaking innovations. The knowledge graph OpenAlex catalogs research papers, concepts, and citation networks, aiming reveal connections across disciplines. This is likely to facilitate such expeditions by mapping relationships between distant disciplines. However, given the challenges of identifying impactful yet surprising intersections, is there a possibility of actually systematizing discovery of boundary-crossing collaborations or does the limitation of programming is such that it renders the tool for use as simply a knowledge base instead of an innovator?

@florenceukeni
Copy link

Something that’s “novel,” as these readings define it, specifically involves an element of surprise—an unexpected, improbable breakthrough that challenges existing beliefs (“surprisal” = −log(probability)). But if surprise is framed as a violation of expectations, how does that shaping of novelty influence if a new discovery is embraced and applauded or dismissed by the research community? Both Shi et al. (2020 and 2023) encourage cross-domain collaborations, but they also highlight how domains can have so many differences between them that those ideas that cross academic boundaries might seem too far outside the accepted norm—essentially ramping up the “violation” aspect until breakthroughs risk rejection. So how exactly do researchers find the balance between causing enough surprise to encourage discovery but not enough to alienate or confuse a field’s gatekeepers? Is it just a game of trial and error and hoping for the best? Or are there techniques or strategies for introducing “novelty” across domains in a way that drives transformative growth?

@dnlchen-uc
Copy link

Shi and Evans demonstrate that scientific breakthroughs often occur as a result of interdiscipinary crosspollination. In social science research, researchers employ multi-agent systems to simulate interactions between different LLMs. Based on the "novelty" principles expressed by Shi and Evans, would using a diverse assortment of LLMs trained with different data (ie a combination of ChatGPT, Gemini, etc) in a simulation result in differentiated emergent behavior? In multi-agent systems, how should researchers balance agent diversity with the added computational and logistical difficulties associated with managing multiple proprietary language models? Do these considerations change when working with authored interactions and human-AI systems?

@jacobchuihyc
Copy link

In Surprise! Measuring Novelty as Expectation Violation, the authors dig into how novelty—whether it’s configurational, structural, or combinatorial—drives innovation by challenging what people expect. The reading talks about how different fields have unique ways of fostering this novelty that are shaped by things such as institutional norms and disciplinary boundaries, which I found intriguing as those structures can either help or hinder progress. On the other hand, in Surprising Combinations of Research Contents and Contexts, the focus shifts to those unexpected pairings called "knowledge expeditions," where ideas or methods from completely different fields come together and end up having tremendous, sometimes boundary-breaking impacts.

Given the findings of these articles, how do policymakers begin to build regulatory frameworks that support both kinds of innovation? How do they encourage gradual, step-by-step changes like configurational novelty, while still leaving room for those big, bold breakthroughs that come from crossing boundaries? Are there examples out there where regulations have successfully struck that balance? And what lessons can we pull from those cases to make policies that don’t accidentally stifle creativity or innovation?

@Adrianne-Li
Copy link

Adrianne-Li commented Jan 21, 2025

Given the emphasis on novelty as both context-sensitive and subjective in the Foster et al. paper and the hypergraph model used by Shi and Evans to predict impactful combinations of research contents and contexts, how can interdisciplinary collaboration platforms be designed to maximize the discovery of high-surprise innovations? Specifically, what role could AI-driven tools, which simulate combinatorial and structural novelty, play in creating and facilitating these platforms while addressing potential biases in novelty perception across disciplines?

@sabrinamatsui31
Copy link

Based on the methodologies presented in “Surprising Combinations of Research Contents and Contexts” and “Surprise! Measuring Novelty as Expectation Violation,” how do the authors conceptualize the role of “surprise” in driving impactful discoveries? Specifically, consider the hypergraph model’s focus on content and context combinations and the notion of novelty as an expectation violation. Both papers highlight the predictive power of measuring surprise, but what are the potential challenges or limitations of applying these models across diverse scientific or technological fields? Furthermore, in the context of interdisciplinary research, how do these studies address the balance between fostering innovation through unconventional combinations and the practical constraints of peer review and funding systems?

@alan-cherman
Copy link

What is a simulation? It is the full output of a model that describes a real-world process. As such, AI stand-ins model human interactions. But what is the model for human social interactions? A model generally requires the builder to be able to classify or score simulated interactions as a good or bad simulation of the real deal. A lot of AI models do this with a "thumbs up, thumbs down" user feedback approach, but I was wondering if there are any quantitative measures of what human speech looks like that are used to train the output of chat AI models.

Zipf's law of word frequency distribution in human text corpora comes to mind. Are these empirically observed quantitative linguistic phenomena used to measure the performance of Chat AI models?

@e-uwatse-12
Copy link

Based on the reading on “Surprising combinations of research contents. How do the context-dependent manifestations of novelty and the interplay between prior knowledge, inventive processes, and unexpected combinations of research contents and contexts shape our understanding of innovation, and what methodological frameworks can be developed to systematically capture these dynamics across diverse fields to predict transformative breakthroughs and their subsequent impact?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests