About
The goal of this app is to provide a highly curated search for staying up-to-date with the latest AI resources and news.
All search results are extracted from Ben's Bites AI Newsletter, which is used as a highly curated data source.
How it works
A cron job is run every 24 hours to update the database.
The steps involved include:
- Crawling the source Beehiiv newsletter
- Converting each post to markdown
- Extracting and resolving unique links
- Fetching opengraph metadata for each link
- Fetching provider-specific metadata for some links (e.g. tweet text)
- Generating vector embeddings for each link using OpenAI
- Upserting all links into a Pinecone vector database
We're using IFramely to extract opengraph metadata for each link, and we also special-case tweet links to extract the tweet text.
Once we have all of the links locally, we upsert them into a Pinecone vector database for semantic search.
Semantic Search
Semantic search is powered by OpenAI's text-embedding-ada-002
embedding model and Pinecone's hosted vector database.
License
This webapp is open source. MIT © Travis Fischer
All link data is extracted from Ben's Bites AI Newsletter and is licensed under CC BY-NC-ND 4.0.