<h2>About</h2>
<p>The goal of this app is to provide a highly curated search for staying up-to-date with the latest AI resources and news.</p>
<p>All search results are extracted from <a href="https://www.bensbites.co/">Ben's Bites AI Newsletter</a>, which is used as a highly curated data source.</p>
<h2>How it works</h2>
<p>A cron job is run every 24 hours to update the database.</p>
<p>The steps involved include:</p>
<ol>
  <li>Crawling the source <a href="https://www.bensbites.co/">Beehiiv newsletter</a></li>
  <li>Converting each post to markdown</li>
  <li>Extracting and resolving unique links</li>
  <li>Fetching opengraph metadata for each link</li>
  <li>Fetching provider-specific metadata for some links (e.g. tweet text)</li>
  <li>Generating vector embeddings for each link using OpenAI</li>
  <li>Upserting all links into a Pinecone vector database</li>
</ol>
<p>We're using <a href="https://iframely.com/">IFramely</a> to extract opengraph metadata for each link, and we also special-case tweet links to extract the tweet text.</p>
<p>Once we have all of the links locally, we upsert them into a <a href="https://www.pinecone.io/">Pinecone</a> vector database for semantic search.</p>
<h3>Semantic Search</h3>
<p>Semantic search is powered by <a href="https://platform.openai.com/docs/guides/embeddings/">OpenAI's <code>text-embedding-ada-002</code> embedding model</a> and <a href="https://www.pinecone.io/">Pinecone's hosted vector database</a>.</p>
<h2>License</h2>
<p>This webapp is <a href="https://github.com/transitive-bullshit/bens-bites-ai-search">open source</a>. MIT © <a href="https://twitter.com/transitive_bs">Travis Fischer</a></p>
<p>All link data is extracted from <a href="https://www.bensbites.co/">Ben's Bites AI Newsletter</a> and is licensed under <a href="https://creativecommons.org/licenses/by-nc-nd/4.0/">CC BY-NC-ND 4.0</a>.</p>


Ben's Bites AI Search

About

How it works

Semantic Search

License