Insights on spaCy, Prodigy and Generative AI by Ines Montani

Nitika Sharma 11 Jul, 2024
4 min read

In our latest episode of the Leading with data, we are thrilled to host Ines Montani, a renowned developer in the field of AI and NLP technology. As the co-founder and CEO of Explosion, and a co-developer of the leading open-source library spaCy and the innovative annotation tool Prodigy, Ines brings a wealth of knowledge and experience. This episode delves into the evolution of spaCy and Prodigy, the unique structure of Explosion, and the transformative impact of generative AI. Join us as we explore insights from the frontlines of NLP and decode the future of data science with Ines Montani.

You can listen to this episode of Leading with Data on popular platforms like Spotify, Google Podcasts, and Apple. Pick your favorite to enjoy the insightful content!

           

Key Insights from our Conversation with Ines Montani 

  • The evolution of spaCy and Prodigy has been centered around enabling developers to build custom NLP solutions that are run in-house.
  • Explosion’s unique structure combines open-source libraries, consulting, and specialized tools like spaCy LLM to address industry-specific NLP challenges.
  • Generative AI has brought impressive advancements but also highlighted the need for structured data and custom tooling in industry applications.
  • The NLP industry is likely to see a shift towards smaller, more efficient models and increased discussions on data privacy and AI ethics.
  • For organizations, the decision between open-source models and big tech APIs should be based on the specific needs of their applications and the ability to control and understand their AI systems.
  • Young professionals entering the NLP field should focus on foundational skills and subject matter expertise to adapt to the evolving landscape of AI and machine learning.

Join our upcoming Leading with Data sessions for insightful discussions with AI and Data Science leaders!

Let’s look into the details of our conversation with Ines Montani :

How has the journey of spaCy and Prodigy evolved since 2017?

Since 2017, our focus has been on making it easier for users to not just use off-the-shelf models but to train their own. We’ve seen spaCy evolve with more components and use cases, especially in extracting structure from text. Our goal has been to enable developers to build custom solutions that they can run in-house, just like developing code. We’ve also been addressing the challenges that come with black box models and APIs, empowering developers to take back control of their NLP stack.

What is the unique structure of Explosion and how do the different components come together?

Explosion is structured around spaCy, our open-source library, and includes consulting and spaCy LLM. We’ve always aimed to build a business on top of spaCy, offering more than just the library while keeping it open source. We didn’t want to lock off features or offer only support, as that would compromise the ease of use. Instead, we developed Prodigy, an annotation tool designed as a developer tool, and we engage in consulting to apply our tools to real-world use cases. This helps us ensure that what we’re building is genuinely useful.

How have you personally experienced the generative AI wave?

The generative AI wave has been impressive, especially seeing how scaling up models can yield such good results. It’s been a mix of surprise and anticipation, as we’ve been closely watching how it fits into NLP workflows and what specific problems it solves. While there’s excitement about few-shot and zero-shot learning, we believe that structured data remains crucial, and there’s still a need for custom tooling around generative AI.

What are some common pain points in implementing generative AI in industry applications?

One major pain point is prompt engineering, which is still more of an art than a science. Another is the specificity required for business applications, as general-purpose models often don’t deliver good results for specialized terminology. Additionally, the dependency on large models and APIs can be economically and operationally challenging, with issues like lack of data privacy and deterministic output. We’re addressing these with spaCy LLM, which provides structured prediction tasks and a familiar output for developers.

I expect a movement towards smaller models, as there’s a lot of potential for them to be just as effective for specific tasks. There will likely be more discussion around data privacy and explainability, as well as a pushback against the monopolization of AI by big tech. Open-source models will continue to play a significant role, and we’ll see a return to focusing on workflows and tooling that support operations and product questions.

What excites you about the future applications of NLP, and what concerns you?

I’m excited about the potential for significantly better systems in structuring unstructured text and the advancements in multimodal data. However, I’m concerned about the overestimation of AI capabilities and the societal impact of misleading perceptions about AI. The misuse of technology and the propagation of bugs are more immediate threats than dystopian scenarios of AI dominance.

How should organizations decide between open-source models and relying on big tech APIs?

Organizations should consider whether they need generative model capabilities at runtime or if they can move this dependency to development. If real-time generation isn’t crucial, open-source models can be more economical and offer greater control. Investing time in creating high-quality data can lead to models that outperform large generative models on specific tasks, making open-source a viable option for many companies.

What advice would you give to young people entering the NLP domain?

Focus on developing core skills like programming and problem-solving rather than chasing the latest technologies. Understanding the basics of language and having subject matter expertise can be invaluable. Think from first principles and prioritize skills that will remain relevant regardless of technological trends.

Summing-up

Our conversation with Ines Montani offered deep insights into the dynamic world of NLP and AI. From the evolution of spaCy and Prodigy to the future trends in the NLP industry, Ines shared invaluable perspectives on the importance of structured data, custom tooling, and the balance between open-source models and big tech APIs. Her advice to young professionals emphasizes foundational skills and subject matter expertise. As we navigate the ever-evolving landscape of AI and machine learning, the insights from Ines Montani will undoubtedly serve as a guiding light. We wish all our listeners the best of luck in their data science journeys!

For more engaging sessions on AI, data science, and GenAI, stay tuned with us on Leading with Data.

Check our upcoming sessions here.

Nitika Sharma 11 Jul, 2024

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear