Ben's Bites AI Search

About

The goal of this app is to provide a highly curated search for staying up-to-date with the latest AI resources and news.

All search results are extracted from Ben's Bites AI Newsletter, which is used as a highly curated data source.

How it works

A cron job is run every 24 hours to update the database.

The steps involved include:

  1. Crawling the source Beehiiv newsletter
  2. Converting each post to markdown
  3. Extracting and resolving unique links
  4. Fetching opengraph metadata for each link
  5. Fetching provider-specific metadata for some links (e.g. tweet text)
  6. Generating vector embeddings for each link using OpenAI
  7. Upserting all links into a Pinecone vector database

We're using IFramely to extract opengraph metadata for each link, and we also special-case tweet links to extract the tweet text.

Once we have all of the links locally, we upsert them into a Pinecone vector database for semantic search.

Semantic Search

Semantic search is powered by OpenAI's text-embedding-ada-002 embedding model and Pinecone's hosted vector database.

License

This webapp is open source. MIT © Travis Fischer

All link data is extracted from Ben's Bites AI Newsletter and is licensed under CC BY-NC-ND 4.0.