You're on the verge of a feature full AI Browser
-
JustGoran last edited by
Keep going!
I think you need to consider integrating vision models that will allow Aria to be a full on browser agent, that can go do tasks I give it. Local is better for security, but anything that works is good enough.
I'm considering any kind of web-app enabled forms as a use case.
I should be able to tell Aria "Hey make a ticket for this conversation I'm having in Slack (browser) and it should be able to go to my ticketing system and just generate a ticket with no APIs in between.
-
Starsfall last edited by
I actually feel like this is a great idea, i think you should also be able to do these things as well;
Lets say I ask Aria to control my volume, {like up or down an amount}, i feel she should be able to do it
Along with the GX player, she should be able to find us music and play it
A few more ideas I had that should be integrated, they are below on bulletin points
-
Asking her to make a sidebar with research tabs she found
-
Asking for her to change your GX theme
-
Having more of a tab instead of a sidebar for her, similar to C.AI
-
Her being a somewhat welcomer on startup
-
Her having a voice? For the visually impaired
-
IF she gets a voice, the voice should have a visual aspect to it, like a circle that moves with the soundwaves
-
Her being able to help and guide visually impaired people, and describe photos?
-
Her being able to manage messages and emails for you, along with other accounts and things {IF you opt in for it, obviously}
-
Her being able to take on a different personality and different traits to better fit the person she is assisting {Maybe even a new name}
-
Her being able to learn about the individual she helps to better fit them as well?
{I know most of this might not be possible, but the future is ahead of us, and all of these features would be very useful to many <3}
-
-
Lilybrooks last edited by
Absolutely agree integrating vision models is the next logical step to make Aria (or any AI browser) truly agentic. The ability to understand and interact with visual web elements would unlock full browser automation. Local execution would definitely boost trust and privacy. Web-app enabled workflows like creating tickets, filling forms, or data extraction all become possible without needing backend APIs. It’s essentially bridging human-like browsing with AI precision.
-
Hattyfatner last edited by
How utterly pointless.
You can't reach over and press a key?
This is great for building large databases of people's information.
But hardly worth it because you can't be bother to lift your hand up and place it on the keyboard.