Hey! I'm working on an MVP with the more features and I hope to have it available sometime soon. The MVP will feature a GPU cache that integrates with customers' existing AWS S3 storage through S3 compatible APIs. It will be designed to consistently exceed at least S3’s standard storage tier in both latency and throughput for data read and write operations to enable fast access to your data. Additionally, the MVP will include a user-friendly web console for managing and monitoring storage, and an integrated payment system to support easy adoption and usability.
In the meantime, please fill out my waitlist and I would love to learn more about your use cases for high performance object storage.
Perhaps the demo video was a bit confusing. You can think of it similar to how a traditional CPU memory cache works in that the client will send a request to the server (GPU memory cache in this case) which will process requests and then move the data to persistent storage async as needed. I'm leveraging the parallel computing power and memory bandwidth of a GPU to do this better than previous approaches. Does that make sense?
> the client will send a request to the server (GPU memory cache in this case)
What's inside the request? is the request a piece of gpu memory (for example, you want to backup machine learning weights)? or is the server using gpu memory as cache?
I know gpu is fast, but it doesn't have network io. you will need to move things back to cpu memory eventually if you need to send things to s3?
moving things between cpu and gpu is not efficient.
and what request needs gpu's power to preprocess? why can't I just simply save things on s3?
unless you chunk a large piece of data into smaller chunks and access them on demand?
what if the server/gpu experiences power outage suddenly? how to guarantee data integrity?
The request will be an S3 style API request and the contents will vary depending on the workload. For example, OpenAI trains an LLM, and for every iteration, a batch of model input data is requested from their object storage. My service would sit in the middle with a subset of the data and respond to the requests faster.
Regarding performance, moving data between CPU and GPU is slow but that can be minimized with CUDA unified memory and eliminated with GPUDirect RDMA. The product being a cache for S3 means that the client uses S3 and probably uses other AWS Services for their backend. I can maximize network bandwidth through the co-location of their backend service and my cache service in the same availability zone of AWS.
The GPU's computational power is useful for processing batched requests in parallel such as metadata requests and GET/PUT/DELETE requests for small objects.
The GPU cache protects data integrity by asynchronously copying changed data to S3 on frequent intervals. Additionally, I am planning to add a WAL option that will leverage the NVME storage of the GPU EC2 instance I am using for the service.
There are my thoughts so far on how I would address these issues. I have implemented most of them, and I'm working on improving it over time. I am happy to answer additional questions.
In the meantime, please fill out my waitlist and I would love to learn more about your use cases for high performance object storage.
> the client will send a request to the server (GPU memory cache in this case)
What's inside the request? is the request a piece of gpu memory (for example, you want to backup machine learning weights)? or is the server using gpu memory as cache?
I know gpu is fast, but it doesn't have network io. you will need to move things back to cpu memory eventually if you need to send things to s3?
moving things between cpu and gpu is not efficient.
and what request needs gpu's power to preprocess? why can't I just simply save things on s3?
unless you chunk a large piece of data into smaller chunks and access them on demand?
what if the server/gpu experiences power outage suddenly? how to guarantee data integrity?
Regarding performance, moving data between CPU and GPU is slow but that can be minimized with CUDA unified memory and eliminated with GPUDirect RDMA. The product being a cache for S3 means that the client uses S3 and probably uses other AWS Services for their backend. I can maximize network bandwidth through the co-location of their backend service and my cache service in the same availability zone of AWS.
The GPU's computational power is useful for processing batched requests in parallel such as metadata requests and GET/PUT/DELETE requests for small objects.
The GPU cache protects data integrity by asynchronously copying changed data to S3 on frequent intervals. Additionally, I am planning to add a WAL option that will leverage the NVME storage of the GPU EC2 instance I am using for the service.
There are my thoughts so far on how I would address these issues. I have implemented most of them, and I'm working on improving it over time. I am happy to answer additional questions.