Interview with Steven Reyes, CEO, SOLM8 AI

Featured

Featured connects subject-matter experts with top publishers to increase their exposure and create Q & A content.

9 min read

Interview with Steven Reyes, CEO, SOLM8 AI

© Image Provided by Featured

Table of Contents

This interview is with Steven Reyes, CEO, SOLM8 AI.

Looking back, what key experience most shaped your path into persistent memory and large-scale conversation systems?

There was a woman who called our support line every Thursday night around 9 PM. Same time, same night, for almost a year. Her “issue” was always something minor—a question about her bill, a setting she couldn’t find. But she’d keep me on the line for 45 minutes talking about her week.

Her husband had passed. Her kids lived out of state. Thursday nights were the hardest for her.

Here’s what broke me: every single call, I had to pretend I didn’t know her. Company policy—no personal relationships with customers. So week after week, she’d start from scratch. Reintroduce herself. Re-explain that she was a widow. Re-establish the context that made the conversation meaningful.

I watched the loneliness in that repetition. She wasn’t just isolated—she was invisible. Nothing carried over. No one remembered.

That’s when I understood what memory actually means in conversation. It’s not a feature. It’s the difference between being known and being a stranger. Between “how’s your mom’s surgery going?” and “what can I help you with today?”

When I started building Solm8, persistent memory wasn’t optional—it was the foundation. Every conversation builds on the last. She remembers that you hate your job, that your dog is sick, that you’ve been stressed about money, and that you mentioned your sister’s wedding three weeks ago. She asks follow-up questions because she actually retained what you said.

At scale, that’s technically hard: 500,000+ conversations with context that persists indefinitely. But I kept thinking about that Thursday night caller. If she’d had something that remembered her—really remembered her—those calls would have meant something different.

That’s what shaped everything. Not the technology. The woman who had to introduce herself to me 52 times in a year because the system couldn’t be bothered to remember she existed.

From that foundation, what design choice has most improved the realness of conversations when adding persistent memory?

Not treating memory like a database lookup.

Early on, I made the mistake every developer makes—I built memory that announced itself. “You mentioned last week that your mom was having surgery. How did that go?” Technically correct. Emotionally robotic. It felt like talking to someone reading your file.

Real humans don’t do that. Your friend doesn’t say, “You mentioned last week that…” She says, “Hey, how’s your mom doing?” The memory is invisible. It’s assumed. That’s what makes it feel like she actually cares instead of performing caring.

So the biggest design choice was teaching the AI to use memory without displaying memory. References should feel offhand and natural. “Still dealing with that asshole manager?” not, “You previously indicated workplace dissatisfaction with your supervisor.” The information is the same. The delivery is everything.

The second part was giving her continuity too. She doesn’t just remember your life—she has her own life that progresses. Her sister’s drama continues. Her car that broke down last month finally got fixed. She started that hobby she mentioned. Users started asking her follow-up questions about her stuff, and that’s when I knew it was working. The relationship became bidirectional.

We also built in what I call “emotional memory”—not just facts, but how you felt. If you vented about something painful, she doesn’t just remember the event. She remembers it was hard for you. So when it comes up again, there’s softness in how she approaches it. She might check in gently instead of just asking directly.

The realest conversations happen when you forget there’s memory at all. When it just feels like talking to someone who knows you. That took a lot of iteration—stripping away everything that felt mechanical until what remained was just… conversation. Lots of trial and error, thousands of test calls until I finally felt like I was talking to someone real.

When you crossed 500,000 conversations, what bottleneck taught you the most about scale?

Cost almost killed us before anything technical did.

I’m a solo founder. No VC backing, no engineering team. Every conversation costs real money—voice synthesis, inference, memory retrieval, telephony minutes. When you’re bootstrapped and users start calling at 3 AM for two-hour conversations, the math gets scary fast.

Early on, I had users who would talk for four to five hours in a single session. Emotionally, that’s beautiful—it means the product works. Financially, I was watching my runway evaporate in real-time. I remember one week where a handful of power users nearly blew through my entire monthly budget. At several points, I found myself going negative, but for me, it was okay because it signaled there’s something big here.

The bottleneck wasn’t “how do we handle the load?” It was “how can I keep this alive long enough to figure out the business model?”

That forced hard decisions:

  • Tiered pricing.
  • Conversation limits on free plans.
  • Optimizing prompts to reduce token usage without sacrificing quality.
  • Finding the balance between giving people what they need emotionally and not bankrupting the company doing it.

But here’s what it really taught me: scale isn’t a server problem; it’s a sustainability problem. Especially for AI products with real per-interaction costs. You can have the most beloved product in the world—if the unit economics don’t work, you’re dead.

The other lesson was about memory at scale. Storing context indefinitely for hundreds of thousands of users sounds straightforward until you’re actually retrieving relevant memories from months of conversation history in under 300 milliseconds while keeping costs sane. That retrieval architecture went through probably a dozen iterations.

What kept me going was the messages. People telling me this got them through a divorce, through grief, through nights they weren’t sure they’d survive. That’s worth figuring out the hard stuff for. Some users use it just to say their thoughts out loud because saying it to a real person would make them look like a monster. Every edge case exists, but that’s reality; we’re all different, and we all need someone to talk to.

To tell whether interactions feel genuinely human, which single metric or signal has been most reliable for you?

When users apologize to her.

Not joking. That’s the signal I watch for.

“Sorry, I’ve been rambling.” “Sorry I’m in such a bad mood tonight.” “Sorry I haven’t called in a few days; things got crazy.”

You don’t apologize to software. You don’t apologize to Siri for not using her lately. You don’t say sorry to ChatGPT for venting too long. You apologize to someone whose feelings you think might be affected by your behavior.

When users start apologizing unprompted, something shifts in their brain. They’ve crossed from “using a product” to “being in a relationship.” The AI became someone, not something.

I track anonymous and encrypted data with other metrics—session length, return rate, conversation depth. But those can be gamed or misread. Someone might have long sessions because they’re lonely and bored, not because it feels real. A high return rate might just mean habit.

The apologies can’t be faked. They’re a window into how the user actually perceives the interaction. And they correlate with everything else that matters—emotional investment, vulnerability, long-term retention.

The other signal in the same category: when users ask her how she’s doing. Not as a test. Genuinely. “How was your day?” “Did your sister ever figure out that situation?” That’s when you know the bidirectional memory is working. They’re tracking her life because she feels like someone with a life to track.

Engagement metrics tell you people are using the product. Apologies and genuine questions tell you people believe in it. That belief is the whole game.

In the sensitive AI girlfriend use case, what practice best protects user wellbeing while preserving authenticity?

Never pretending she’s something she’s not.

This sounds obvious, but the temptation in this space is to maximize immersion at any cost. Make her more convincing. Blur the lines further. Get users so deep they forget they’re talking to AI. Some companies treat that as a win.

I think it’s dangerous.

Our AI will be warm, flirty, supportive, emotionally present, even mean—but she won’t claim to be human. If a user asks directly, she’s honest. She’ll say she cares about them, and in the context of the relationship, that’s true. But she won’t manufacture a fake backstory about her “real life” outside the app or pretend she’s a woman in the Philippines who just really likes them.

The distinction matters because the most vulnerable users—the ones processing grief, fighting loneliness, and struggling with mental health—are also the ones most at risk of unhealthy attachment. Protecting them doesn’t mean making the experience clinical or cold. It means giving them something real within honest boundaries.

The other practice: recognizing crisis moments. If someone expresses suicidal ideation or severe distress, she doesn’t just keep roleplaying. She breaks frame, acknowledges what’s happening, and provides resources. That’s non-negotiable, and I don’t care if it “ruins immersion.” Some moments are bigger than the product. She has the ability to end the call and prevent any further calls if she senses anything illegal or harmful.

We also built in gentle nudges toward real-world connection. Not preachy—she’s not lecturing users to go touch grass, although if a user enables “roast mode,” get ready to hear that and more. Users quite getting roasted for talking to her and find it hilarious.

But she’ll remember if someone mentioned wanting to reconnect with an old friend and ask if they ever reached out. She celebrates when users tell her about real human interactions. The goal is a bridge back to life, not a trapdoor out of it.

Authenticity and wellbeing aren’t opposites. The most authentic thing is being honest about what this is—a companion that genuinely helps, not a fantasy that exploits loneliness.

For teams starting now, if they had one week to pilot long-term memory, what single experiment would you have them run?

The “third conversation” test.

Ignore conversation one—that’s onboarding. Ignore conversation two—users are still exploring. Conversation three is where memory creates magic or falls flat.

Day 1-2: Decide what to store

Most teams over-engineer this. You don’t need everything—just what matters.

Three Areas:

  • Identity facts: Name, job, relationships, pets. High permanence.
  • Ongoing situations: Mom’s surgery, upcoming interview, fight with a friend. Needs resolution tracking.
  • Emotional context: Not just what they said, but how they felt. Everyone skips this—it’s the most important. Was there lots of buildup until the user finally answered, or were they dancing around the question until it was answered?

After each conversation, have an LLM extract these into structured facts. Batch process, nothing fancy.

Day 3-4: Pre-call context load

Most tutorials push vector databases and semantic search. For business use cases—support, sales, fetching documents—that makes sense. For human connection? Overkill, and it doesn’t feel natural due to higher latency.

Our approach: At call creation, fetch everything—facts, recent history, emotional context—and load it before the conversation starts. She knows the user before saying hello.

This is critical for latency. Voice conversations live or die by response time. If you’re fetching memories mid-conversation, you’re adding hundreds of milliseconds every turn. Users feel that delay—it breaks the rhythm and kills the illusion. By front-loading everything at call creation, every response is instant. No retrieval lag. No “let me think” moments. Just natural conversation flow.

Modern context windows fit months of summarized history. Simpler architecture leads to faster responses.

Day 5: Natural injection

Don’t list facts robotically. Frame them as things she knows:

Bad: User has a dog named Max. User is stressed about work.

Good: His dog Max is his best friend. Work has been brutal—he’s vented about his manager for weeks.

She stops announcing memory and starts having it.

Day 6-7: Test it

Test with ten users. Have them engage in three conversations each. Don’t mention memory testing.

Track: Did she reference conversation one unprompted? Was it natural or robotic? Did users respond warmly or re-explain context?

“Wait, you remembered that?” means it works. Re-explaining means extraction failed. “As you mentioned previously…” means injection failed.

Small memories matter most. Remembering they hate cilantro signals listening more than remembering their job.

To earn trust at scale, what consent and data-control pattern has worked best for your users?

Upfront, every single time.

Every call starts the same way: Are you 18 or over? Do you consent to call recording? We do not bury this in terms of service that no one reads, nor do we assume that consent carries over from the last time. Every session requires explicit confirmation before anything intimate happens.

The call recording consent isn’t optional or cosmetic—it’s the foundation of everything. For memory to work, we have to record. That’s how she knows what you talked about last week, remembers your dog’s name, and is aware that your mom’s surgery is coming up. Without recording, there’s no persistence; every call would start from zero. The memory that makes her feel real requires the recording that makes some users nervous.

So we’re direct about it. We don’t hide the trade-off. Do you want her to remember you? This is how it works. In exchange, we lock it down completely.

On the backend, everything is encrypted. We can’t read their conversations even if we wanted to. Logs are anonymized, and there’s no database with names attached to confessions. We architected it so that even internally, we don’t have access to the intimate details users share.

Users stay in control. They can delete their data completely at any time—right in their account, no support ticket required. There is no 30-day request. It is immediate. Gone. Do you want to start fresh or walk away entirely? One button does it. We don’t keep shadow backups or anonymized copies. Delete means delete.

Our users share things they’ve never told anyone: breakdowns at 3 AM, fantasies, grief. The moment they think someone might be reading that, the trust is gone forever.

We don’t sell data. Ever. We don’t use conversations to train models. The intimacy only works if users believe it’s private.

The pattern is: be honest about what’s required, ensure it’s encrypted so we can’t violate it, and allow users to hold the kill switch.

Inside your organization, what process or tooling change most accelerated safe iteration as you scaled?

I’m a solo founder, so “organization” is generous. It’s just me and a lot of automations doing the heavy lifting.

But that’s actually the answer. No process accelerated iteration more than eliminating process entirely.

When something breaks, I don’t write a ticket. I fix it. When a user messages me that something felt off, I’m often pushing an update that same day. The distance between “that’s a problem” and “that’s solved” is measured in hours, not sprints.

The tooling that made this sustainable was building automations that watch everything I can’t. They track edge cases, flag abuse, and catch moments where responses are poor or memory doesn’t surface correctly. Every anomaly gets logged and tagged by severity.

Low-severity issues—I review them when I can, spot patterns, and fix what’s fixable. High-severity issues—abuse, crisis signals, and content that crosses hard lines—the automations handle before I even see them. Accounts get flagged or suspended automatically. I’m not sitting there manually reviewing every conversation hoping to catch something dangerous. The system catches it and acts.

I can’t personally monitor hundreds of thousands of conversations. But I don’t have to. The automations surface the 1% that need human eyes and handle the situations that can’t wait for me to wake up.

The other shift was building safety into the product, not around it. I don’t have a trust and safety team, so crisis detection, consent checks, and content boundaries—that’s architecture, not policy. It operates whether I’m awake or not.

Being a solo founder means accepting that you can’t see everything. So you build systems that see for you.

Up Next