The goal of this app is to provide a highly curated search for staying up-to-date with the latest AI resources and news.
All search results are extracted from Ben's Bites AI Newsletter, which is used as a highly curated data source.
How it works
A cron job is run every 24 hours to update the database.
The steps involved include:
- Crawling the source Beehiiv newsletter
- Converting each post to markdown
- Extracting and resolving unique links
- Fetching opengraph metadata for each link
- Fetching provider-specific metadata for some links (e.g. tweet text)
- Generating vector embeddings for each link using OpenAI
- Upserting all links into a Pinecone vector database
We're using IFramely to extract opengraph metadata for each link, and we also special-case tweet links to extract the tweet text.
Once we have all of the links locally, we upsert them into a Pinecone vector database for semantic search.