My name is Shelby. I'm a solutions engineer. I may have made myself redundant :)
Shelby-as-a-Service (SaaS) is a slack/discord bot that users can use to answer domain specific questions. Related documents are retrieved via semantic search from a vector database and ‘stuffed’ into a GPT prompt along with the query. Call it ‘context enriched queries’ if you want to be fancy, but it’s actually pretty simple. It does work surprisingly well though.
It sounds easy to do, and it is! But then you also have to figure how to scrape/chunk/parse all your docs properly, make sure the bot returns links to the docs, build a devops workflow, and a few dozen other “easy” things…. And by then you realize you’ve built something that might be pretty useful to others and here we are :)
Easy Deployment: Just provide API keys and document source URLs.
Automated Document Processing: Scrapes, processes, and uploads data from sources like websites, gitbook, sitemaps, and OpenAPI specs.
Advanced Document Retrieval: Generates additional search keywords to improve semantic search. It also uses the superior (for this use case) sparse/dense embedding method. Finally, it checks document relevance by asking GPT if they’re relevant. (Don’t worry, logit biasing means these extra API calls don’t add significant response times.)
Shelby-as-a-Service (SaaS) is a slack/discord bot that users can use to answer domain specific questions. Related documents are retrieved via semantic search from a vector database and ‘stuffed’ into a GPT prompt along with the query. Call it ‘context enriched queries’ if you want to be fancy, but it’s actually pretty simple. It does work surprisingly well though.
It sounds easy to do, and it is! But then you also have to figure how to scrape/chunk/parse all your docs properly, make sure the bot returns links to the docs, build a devops workflow, and a few dozen other “easy” things…. And by then you realize you’ve built something that might be pretty useful to others and here we are :)
Easy Deployment: Just provide API keys and document source URLs.
Automated Document Processing: Scrapes, processes, and uploads data from sources like websites, gitbook, sitemaps, and OpenAPI specs.
Advanced Document Retrieval: Generates additional search keywords to improve semantic search. It also uses the superior (for this use case) sparse/dense embedding method. Finally, it checks document relevance by asking GPT if they’re relevant. (Don’t worry, logit biasing means these extra API calls don’t add significant response times.)