This interview is with Runbo Li, CEO, Magic Hour AI.
For readers at Featured, could you introduce yourself—your role as a CEO in higher education—and how your background in economics, data analysis, and teaching shapes the way you build and evaluate AI in the browser?
I’m the co-founder and CEO of Magic Hour, an AI creation platform focused on making powerful image and video tools accessible in the browser. My background in economics taught me to think on the margin—to weigh trade-offs carefully, compare opportunity costs, and focus on what actually changes behavior.
My experience in data analysis taught me to care about causal inference, not just correlation, so I try to be rigorous about whether a feature or model improvement is truly creating value. I also spent time teaching, which shaped how I think about product design: if something is confusing, slow, or opaque, most users will not trust it.
When I evaluate AI in the browser, I examine it through a practical lens:
- Is the quality meaningfully better?
- Is the speed acceptable?
- Is the cost justified?
- Does the user benefit enough for the ROI to be clearly positive?
What pivotal experiences moved you from researcher/teacher to leading a team that deploys AI tools in education?
My path was less about moving from research and teaching into AI and more about realizing that the same habits of mind carried over. Economics trained me to think in trade-offs and on the margin. Data analysis trained me to ask what is actually causing an outcome, not just what looks correlated with it. Teaching shaped my belief that if something is not intuitive, people will not use it well, no matter how powerful it is.
What pushed me into leading a team was seeing how often powerful technology failed at the point of use. The models were impressive, but the real question was whether they were useful, affordable, fast enough, and clear enough for real people. That made the work feel less academic and more operational. I became interested not just in what AI could do in theory but in how to deploy it in a way that meaningfully improves outcomes.
So the shift happened when I stopped thinking only about capability and started thinking about adoption, trust, and ROI. That is still how I lead today. I want our team to be rigorous about whether a tool genuinely helps people learn, create, or work better, and whether the benefit is large enough to justify the cost and complexity.
Building on that foundation, which single browser setting would you mandate by default across campus before rolling out any AI tools?
I would mandate one thing by default: require explicit permission prompts for camera, microphone, screen sharing, clipboard, and file access, with no persistent “always allow” unless it is centrally approved.
The reason is simple. Before you worry about model quality or adoption, you need to control data exposure. In an educational setting, one careless permission grant can expose student work, classroom discussions, or internal material. A browser should make those flows intentional, visible, and revocable.
From a cost-benefit perspective, this is a high-leverage default. The friction cost is small – one extra click when access is genuinely needed. The downside it prevents is much larger. It also sets the right tone institutionally: AI should be introduced as a governed tool, not an ambient one.
Staying with the theme of speed, what specific “time-to-value in seconds” metric do you track for a browser-based AI feature, and why does it matter in higher education?
I would use a metric like “time to first meaningful result”—the number of seconds from when a user starts the task to when they see an output that is actually useful, not just a loading state or placeholder.
For a browser-based AI feature, that means I would measure the full path:
- page load
- any setup or upload friction
- prompt entry
- generation latency
- first usable output on screen
Not just model inference time.
Why it matters in higher education is that the bar for usefulness is very unforgiving. If a student or instructor cannot get to value almost immediately, the tool starts to feel like overhead rather than help. In practice, that means attention drops, trust drops, and adoption drops. In a classroom, office hours, or a study session, even small delays compound because the user is usually trying to accomplish something specific under time pressure.
So the question I care about is not “how fast is the model,” but “how many seconds until the user feels helped?” That is the metric that best captures whether the product is actually viable in an educational setting.
Moving from metrics to implementation, what does your current browser-based AI tools stack look like for students and faculty, including any Python notebooks, ArcGIS Online, or LMS integrations you rely on?
My current stack is less built around higher education infrastructure specifically and more around delivering AI tools that work reliably in the browser for real users at scale. At a high level, I think about the stack in layers: the frontend experience, the model layer, the orchestration layer, and the measurement layer.
On the frontend side, the priority is making advanced AI feel simple and fast in the browser. Underneath that, we rely on a mix of third-party and open models depending on the task, with orchestration that handles routing, fallbacks, latency, and cost. We also invest heavily in product instrumentation because understanding time-to-value, completion rates, retention, and actual usage patterns matters as much as the model itself.
Conceptually, the same framework would apply in education. Whether the user is a creator, a student, or a faculty member, the goal is the same: browser-based AI has to be accessible, responsive, and trustworthy. The stack only matters insofar as it helps deliver that outcome.
With your econometrics skill set, how do you design a credible test in live courses to isolate the causal impact of an AI tool used in the browser?
I would start with a simple rule: never compare students who chose to use the AI tool against students who did not. That mainly measures selection, not impact. The cleanest design is to randomize access at the right level—ideally by section, instructor, or assignment, depending on where spillovers are least likely.
Then, I would define a small number of outcomes in advance, such as assignment completion, quiz performance, time on task, or course persistence, and measure intent-to-treat first, not just usage among compliers.
I would also collect baseline measures before the rollout—prior grades, attendance, engagement, and any other predictors of performance—so I can improve precision and check balance across groups. If randomization is not feasible, I would use the next-best design, such as a phased rollout or a difference-in-differences setup with a credible comparison group, but I would treat that as weaker evidence.
The key is to link browser-level usage data to real course outcomes while being disciplined about identification. The question is not whether the tool looks helpful, but whether students performed better because of it.
Thinking about access and equity for children and adult learners, which precise setting—either in the browser or the AI tool—has most improved accessibility in seconds for your users?
The single setting that has improved accessibility most for our users is defaulting the AI tool to guided mode instead of expert mode. In practice, that means users see plain-language instructions, example inputs, and a few constrained choices rather than a blank prompt box and a long list of settings.
That matters because the biggest barrier for many children and adult learners is not raw capability; it is cognitive overhead. If the tool feels intimidating, people disengage before they get value. Guided mode reduces that friction in seconds. It makes the product more usable for people with less technical confidence, less prior exposure to AI tools, or less time to learn a new interface. For me, that has been one of the clearest examples of accessibility improving not through more features, but through better defaults.
On privacy and ethics, how do you configure AI and browser settings to comply with FERPA/GDPR while preserving the usefulness of the tools for research and teaching?
I would configure AI for education around one principle: minimum necessary data, maximum intentionality. In practice, that means strict permission prompts in the browser, no persistent access by default, single sign-on (SSO) and role-based controls, short retention, strong audit logs, and contracts that tightly limit what third-party vendors can do with student data.
For teaching and research, I would default to de-identified or synthetic data and only allow identified student data through a narrower, reviewed workflow. That preserves most of the usefulness of AI while staying aligned with the core FERPA and GDPR principles of legitimate use, data minimization, security, and privacy by default.
To leave readers with something actionable, what 60-second pre-launch checklist do you personally run before turning on a new AI feature in the browser for your campus?
Before turning on any new browser-based AI feature in an educational environment, I run a very short checklist:
1. Is the use case narrow and clear? Can I explain in one sentence what problem this solves for a student, instructor, or staff member?
2. Does it avoid unnecessary data exposure? Are there no sensitive inputs unless they are truly required, and are permissions limited to the minimum needed?
3. Is time-to-value fast enough? Can a user get to a meaningful result in seconds rather than feeling stuck in setup or latency?
4. Are the failure states acceptable? If the model is wrong, slow, or unavailable, does the user understand what happened and have a safe fallback?
5. Is the output trustworthy enough for the setting? It doesn’t have to be perfect, but it should be good enough that it helps more than it harms.
6. Do I have measurement and rollback in place? Can I see adoption, quality, and errors quickly, and turn it off just as quickly if needed?
Thanks for sharing your knowledge and expertise. Is there anything else you'd like to add?
One thing I would add is that most AI rollouts fail not because the models are weak, but because the product and the governance are weak. People often focus on capability first, when they should focus on clarity, speed, trust, and measurable value.
For me, the right question is never just whether an AI tool can do something. It is whether it helps real people do meaningful work better, with a benefit that clearly outweighs the cost, risk, and complexity. If institutions keep that standard, they will make better decisions and avoid a lot of noise.