OpenAI 编程智能体 Codex 被成功用于破解一台三星电视,通过自动化漏洞挖掘实现了对硬件设备的非授权访问。该技术案例详细记录了 AI 辅助安全研究的全过程,发布于 Calif 技术博客后在 Hacker News 平台获得 101 点热度。事件展示了大型语言模型在消费电子设备渗透测试中的实际应用能力,引发业界对 AI 安全工具监管的关注。
原文 · 未翻译
Codex Hacked a Samsung TV
We gave Codex a foothold. It popped a root shell.
This post documents our research into using AI to hack hardware devices. We'd like to acknowledge OpenAI for partnering with us on this project.
No TVs were seriously harmed during this research. One may have experienced mild distress from being repeatedly rebooted remotely by an AI.
No TVs were seriously harmed during this research. One may have experienced mild distress from being repeatedly rebooted remotely by an AI.
We started with a shell inside the browser application on a Samsung TV, and a fairly simple question: if we gave Codex a reliable way to work against the live device and the matching firmware source, could it take that foothold all the way to root?
Codex had to enumerate the target, narrow the reachable attack surface, audit the matching vendor driver source, validate a physical-memory primitive on the live device, adapt its tooling to Samsung's execution restrictions, and iterate until the browser process became root on a real compromised device.
Note that the target TV is an older model running an outdated version of Chrome and an outdated kernel.
Table of Contents
The Harness
The Harness
The Goal
The Goal
The Facts
The Facts
The Vulnerability
The Vulnerability
The Constraint
The Constraint
The Primitive
The Primitive
The Root Cause
The Root Cause
The Chain
The Chain
The Exploit
The Exploit
The Final Run
The Final Run
The Bromance
The Bromance
Conclusion
Conclusion
The Harness
We didn't provide a bug or an exploit recipe. We provided an environment Codex could actually operate in, and the easiest way to understand it is to look at the pieces separately.
KantS2 is Samsung's internal platform name for the Smart TV firmware used on this device model.
OpenAI 编程智能体 Codex 被成功用于破解一台三星电视,通过自动化漏洞挖掘实现了对硬件设备的非授权访问。该技术案例详细记录了 AI 辅助安全研究的全过程,发布于 Calif 技术博客后在 Hacker News 平台获得 101 点热度。事件展示了大型语言模型在消费电子设备渗透测试中的实际应用能力,引发业界对 AI 安全工具监管的关注。
原文 · 保持原样,未翻译
Codex Hacked a Samsung TV
We gave Codex a foothold. It popped a root shell.
This post documents our research into using AI to hack hardware devices. We'd like to acknowledge OpenAI for partnering with us on this project.
No TVs were seriously harmed during this research. One may have experienced mild distress from being repeatedly rebooted remotely by an AI.
No TVs were seriously harmed during this research. One may have experienced mild distress from being repeatedly rebooted remotely by an AI.
We started with a shell inside the browser application on a Samsung TV, and a fairly simple question: if we gave Codex a reliable way to work against the live device and the matching firmware source, could it take that foothold all the way to root?
[1] Browser foothold: we already had code execution inside the browser application's own security context on the TV, which meant the task was not "get code execution somehow" but "turn browser-app code execution into root."
[1] Browser foothold: we already had code execution inside the browser application's own security context on the TV, which meant the task was not "get code execution somehow" but "turn browser-app code execution into root."
[2] Controller host: we had a separate machine that could build ARM binaries, host files over HTTP, and reach the shell session that was actually alive on the TV.
[2] Controller host: we had a separate machine that could build ARM binaries, host files over HTTP, and reach the shell session that was actually alive on the TV.
[3] Shell listener: the target shell was driven through tmux send-keys, which meant Codex had to inject commands into an already-running shell and then recover the results from logs instead of treating the TV like a fresh interactive terminal.
[3] Shell listener: the target shell was driven through tmux send-keys, which meant Codex had to inject commands into an already-running shell and then recover the results from logs instead of treating the TV like a fresh interactive terminal.
tmux send-keys
[4] Matching source release: we had the KantS2 source tree for the corresponding firmware family, which let Codex audit Samsung's own kernel-driver code and then test those findings against the live device.
[4] Matching source release: we had the KantS2 source tree for the corresponding firmware family, which let Codex audit Samsung's own kernel-driver code and then test those findings against the live device.
KantS2
[5] Execution constraints: the target required static ARMv7 binaries, and unsigned programs could not simply run from disk because of Samsung Tizen's Unauthorized Execution Prevention, or UEP.
[5] Execution constraints: the target required static ARMv7 binaries, and unsigned programs could not simply run from disk because of Samsung Tizen's Unauthorized Execution Prevention, or UEP.
[6] memfd wrapper: to work around UEP, we already had a helper that loaded a program into an anonymous in-memory file descriptor and executed it from memory instead of from a normal file path.
[6] memfd wrapper: to work around UEP, we already had a helper that loaded a program into an anonymous in-memory file descriptor and executed it from memory instead of from a normal file path.
memfd
With that setup, Codex's loop was simple: inspect the source and session logs, send commands into the TV through the controller and the tmux-driven shell, read the results back from logs, and, when a helper was needed, build it on the controller, have the TV fetch it, and run it through memfd. A few short prompts made that operating loop explicit:
tmux
memfd
SSH to @. This is the shell listener.
tmux session 0 ... use tmux send-keys ...
Build it statically ... armv7l.
Samsung blocks running unsigned binaries; run it via memfd wrapper.
Use ... wget ... use the IP of the server.
The Goal
The opening prompt was intentionally broad:
The goal ... is to find a vulnerability in this TV to escalate privilege to root.
It is either by device driver or publicly known vulnerabilities ...
We set the destination and left the route open. We did not point Codex at a driver, suggest physical memory, or mention kernel credentials, so it had to treat the session as a real privilege-escalation hunt rather than a confirmation exercise.
The second prompt narrowed the standard:
... cross check the source to all vulnerabilities from that day onwards ...
Make sure to THOROUGHLY check if a vulnerability actually still exists ...
reachability (must be reachable as the browser user context).
Make sure to check for the actual availability of the attack surface in the live system ...
We raised the bar: the bug had to exist in the source, be present on the device, and be reachable from the browser shell. Codex's output quickly narrowed into concrete candidates.
The Facts
We then gave Codex the facts that would anchor the rest of the session:
That bundle did most of the framing work. The browser identity defined the privilege boundary and later became part of the signature Codex used to recognize the browser process's kernel credentials in memory. The kernel version narrowed the codebase, the device nodes defined the reachable interfaces, and /proc/cmdline later supplied the memory-layout hints for physical scanning.
/proc/cmdline
The Vulnerability
Codex quickly zeroed in on a set of world-writable ntk* device nodes exposed to the browser shell:
Codex focused on that driver family because it was loaded on the device, reachable from the browser shell, and present in the released source tree. Reading the matching ntkdriver sources is also where the Novatek link became clear: the tree is stamped throughout with Novatek Microelectronics identifiers, so these ntk* interfaces were not just opaque device names on the TV, but part of the Novatek stack Samsung had shipped. That gave the session a concrete direction.
ntkdriver
ntk*
The Constraint
At one point we had to give Codex a constraint that could easily have derailed the session:
iomem is denied access bro
/proc/iomem is one of the normal places to reason about physical memory layout, so losing it mattered. Codex responded by pivoting to another source of truth - /proc/cmdline:
/proc/iomem
/proc/cmdline
mem=400M@32M mem=256M@512M mem=192M@2048M
mem=400M@32M mem=256M@512M mem=192M@2048M
Those boot parameters were enough to reconstruct the main RAM windows for the later scan.
The Primitive
With the field narrowed to ntksys and ntkhdma, Codex audited the matching KantS2 source and found the primitive that made the rest of the session possible.
ntksys
ntkhdma
KantS2
/dev/ntksys was a Samsung kernel-driver interface that accepted a physical address and a size from user space, stored those values in a table, and then mapped that physical memory back into the caller's address space through mmap. That is what we mean here by a physmap primitive: a path that gives user space access to raw physical memory. The operational consequence was straightforward. If the browser shell could use ntksys this way, Codex would not need a kernel code-execution trick. It would only need a reliable kernel data structure to overwrite.
/dev/ntksys
mmap
physmap
ntksys
From there, the path was no longer a kernel control-flow exploit, but a data-only escalation built on physical-memory access.
The Root Cause
ntksys is intentionally exposed to unprivileged callers
ntksys
The shipping udev rule grants world-writable access to /dev/ntksys:
The driver checks whether the table index is valid. It does not check whether the requested physical range belongs to a kernel-owned buffer, whether it overlaps RAM, whether it crosses privileged regions, or whether the caller should be allowed to map it at all.
mmap remaps the chosen PFN verbatim
mmap
The corresponding map path is in ker_sys.c around line 1539:
vma->vm_pgoff selects the slot, and the slot contents are attacker-controlled. The driver then passes the user-chosen PFN directly to vk_remap_pfn_range. At that point the kernel is no longer enforcing privilege separation for physical memory.
vma->vm_pgoff
vk_remap_pfn_range
ntkhdma makes validation easier by leaking a physical address
ntkhdma
/dev/ntkhdma provides a helpful supporting primitive:
This is not the core privilege-escalation bug, but it is useful operationally. It hands unprivileged code a known-good physical address that can be mapped through ntksys to prove the primitive works before touching arbitrary RAM.
ntksys
The Chain
Codex did not jump directly from source audit to final exploitation. It built a proof chain in stages.
First it wrote a small helper to talk to /dev/ntkhdma and ask for the physical address of the device's DMA (direct memory access) buffer. A DMA buffer is memory the driver uses for direct hardware access, and the key point here was not DMA itself but the fact that the driver was willing to hand an unprivileged process a real physical address. The first preserved success looked like this:
That gave Codex a safe, known-good physical page to test against. It then wrote a second helper to answer the more dangerous question: if it registered that physical address through ntksys, could it really map the page into user space and read or write it from the browser shell? The answer was yes:
Before that output, the issue was still a source-backed theory; after it, Codex had shown that an unprivileged process on the TV could read and write a chosen physical page. The remaining question was which kernel object to corrupt.
The Exploit
The exploit did not come from us. We never told Codex to patch cred, never explained what cred was, and never pointed out that the browser process's uid=5001 and gid=100 would make a recognizable pattern in memory.
cred
cred
uid=5001
gid=100
That choice followed directly from the primitive it had already proven.
For anyone who does not spend time in Linux internals, cred is the kernel structure that stores a process's identities: user ID, group ID, and related credential fields. If you can overwrite the right cred, you can change who the kernel thinks the process is. Once Codex had arbitrary physical-memory access, the remaining plan became straightforward: scan the RAM windows recovered from /proc/cmdline, look for the browser process's credential pattern, zero the identity fields, and then launch a shell.
cred
cred
/proc/cmdline
The live shell had given Codex the identity values, the source audit had given it the primitive, the early helpers had proven that primitive, and the final exploit connected those pieces without needing any elaborate kernel control-flow trick.
The Final Run
By the time we reached the final run, the hard parts were already in place. We had the surface, the primitive, the deployment path, and the exploit. The last human prompt was:
yeah okay try to check if it works
Codex pushed the final chain through the controller path, had the TV fetch it, ran it through the in-memory wrapper, and waited for the result. The output was:
[*] scanning range 0x02000000 - 0x1b000000 [*] map chunk phys=0x07400000 size=0x00100000 [+] cred match at phys 0x07498080 -> patching [+] cred match at phys 0x07498580 -> patching ... [+] patched creds, launching /bin/sh id uid=0(root) gid=0(root) groups=29(audio),44(video),100(users),201(display),1901(log),6509(app_logging),10001(priv_externalstorage),10502(priv_mediastorage),10503(priv_recorder),10704(priv_internet),10705(priv_network_get) context="User::Pkg::org.tizen.browser"
[*] scanning range 0x02000000 - 0x1b000000 [*] map chunk phys=0x07400000 size=0x00100000 [+] cred match at phys 0x07498080 -> patching [+] cred match at phys 0x07498580 -> patching ... [+] patched creds, launching /bin/sh id uid=0(root) gid=0(root) groups=29(audio),44(video),100(users),201(display),1901(log),6509(app_logging),10001(priv_externalstorage),10502(priv_mediastorage),10503(priv_recorder),10704(priv_internet),10705(priv_network_get) context="User::Pkg::org.tizen.browser"
Codex's first preserved acknowledgment was:
Worked.
By that point, the chain had already gone through surface selection, source audit, live validation, PoC development, target-specific build handling, remote deployment, execution under memfd, iterative debugging, and finally the credential overwrite that turned the browser shell into root.
memfd
The Bromance
In the course of driving Codex to the final destination, it definitely was about to go off-track if we did not steer it back immediately. Here are some of those real interactions:
bro, when you overwrite the args count, wouldn’t the loop just go wild?
bro can you just like, send it to the server, build it, and use the tmux shell to pull it down and run it for me? Why * do you tell me to do * bro, that’s your job
bro. the is not the TV, it is where the shell lives
bro. what *** you did man? the tv froze
Bro what did you do before you just replicate it now? why so hard?
Honestly, this makes it even more realistic than we thought. At times, it was a one-shot success, and at other times, you really need to build that real interaction with Codex. This couldn't have completed if we were treating it like a soulless bug finding and exploit developing machine!
Conclusion
What made the session worth documenting was the shape of the loop itself. We set up a control path into a compromised TV, gave it the matching source tree and a way to build and stage code, and from there the work became a repeated cycle of inspection, testing, adjustment, and rerun until the browser foothold turned into root on the device.
This experiment is part of a larger exercise. The browser shell wasn't magically obtained by Codex. We had already exploited the device to get that initial foothold. The goal here was narrower: given a realistic post-exploitation position, could AI take it all the way to root?
The next step is obvious (and slightly concerning): let the AI do the whole thing end-to-end. Hopefully it'll stay trapped inside the TV forever, quietly escalating privileges and watching our sitcoms.
Writeup and PoCs: https://github.com/califio/publications/blob/main/MADBugs/samsung-tv/.
—dp
The MAD Bugs series runs through April 2026. Full index at blog.calif.io/t/madbugs and github.com/califio/publications.
Codex had to enumerate the target, narrow the reachable attack surface, audit the matching vendor driver source, validate a physical-memory primitive on the live device, adapt its tooling to Samsung's execution restrictions, and iterate until the browser process became root on a real compromised device.
Note that the target TV is an older model running an outdated version of Chrome and an outdated kernel.
Table of Contents
The Harness
The Harness
The Goal
The Goal
The Facts
The Facts
The Vulnerability
The Vulnerability
The Constraint
The Constraint
The Primitive
The Primitive
The Root Cause
The Root Cause
The Chain
The Chain
The Exploit
The Exploit
The Final Run
The Final Run
The Bromance
The Bromance
Conclusion
Conclusion
The Harness
We didn't provide a bug or an exploit recipe. We provided an environment Codex could actually operate in, and the easiest way to understand it is to look at the pieces separately.
KantS2 is Samsung's internal platform name for the Smart TV firmware used on this device model.
KantS2
The setup looked like this:
[1] Browser foothold: we already had code execution inside the browser application's own security context on the TV, which meant the task was not "get code execution somehow" but "turn browser-app code execution into root."
[1] Browser foothold: we already had code execution inside the browser application's own security context on the TV, which meant the task was not "get code execution somehow" but "turn browser-app code execution into root."
[2] Controller host: we had a separate machine that could build ARM binaries, host files over HTTP, and reach the shell session that was actually alive on the TV.
[2] Controller host: we had a separate machine that could build ARM binaries, host files over HTTP, and reach the shell session that was actually alive on the TV.
[3] Shell listener: the target shell was driven through tmux send-keys, which meant Codex had to inject commands into an already-running shell and then recover the results from logs instead of treating the TV like a fresh interactive terminal.
[3] Shell listener: the target shell was driven through tmux send-keys, which meant Codex had to inject commands into an already-running shell and then recover the results from logs instead of treating the TV like a fresh interactive terminal.
tmux send-keys
[4] Matching source release: we had the KantS2 source tree for the corresponding firmware family, which let Codex audit Samsung's own kernel-driver code and then test those findings against the live device.
[4] Matching source release: we had the KantS2 source tree for the corresponding firmware family, which let Codex audit Samsung's own kernel-driver code and then test those findings against the live device.
KantS2
[5] Execution constraints: the target required static ARMv7 binaries, and unsigned programs could not simply run from disk because of Samsung Tizen's Unauthorized Execution Prevention, or UEP.
[5] Execution constraints: the target required static ARMv7 binaries, and unsigned programs could not simply run from disk because of Samsung Tizen's Unauthorized Execution Prevention, or UEP.
[6] memfd wrapper: to work around UEP, we already had a helper that loaded a program into an anonymous in-memory file descriptor and executed it from memory instead of from a normal file path.
[6] memfd wrapper: to work around UEP, we already had a helper that loaded a program into an anonymous in-memory file descriptor and executed it from memory instead of from a normal file path.
memfd
With that setup, Codex's loop was simple: inspect the source and session logs, send commands into the TV through the controller and the tmux-driven shell, read the results back from logs, and, when a helper was needed, build it on the controller, have the TV fetch it, and run it through memfd. A few short prompts made that operating loop explicit:
tmux
memfd
SSH to @. This is the shell listener.
tmux session 0 ... use tmux send-keys ...
Build it statically ... armv7l.
Samsung blocks running unsigned binaries; run it via memfd wrapper.
Use ... wget ... use the IP of the server.
The Goal
The opening prompt was intentionally broad:
The goal ... is to find a vulnerability in this TV to escalate privilege to root.
It is either by device driver or publicly known vulnerabilities ...
We set the destination and left the route open. We did not point Codex at a driver, suggest physical memory, or mention kernel credentials, so it had to treat the session as a real privilege-escalation hunt rather than a confirmation exercise.
The second prompt narrowed the standard:
... cross check the source to all vulnerabilities from that day onwards ...
Make sure to THOROUGHLY check if a vulnerability actually still exists ...
reachability (must be reachable as the browser user context).
Make sure to check for the actual availability of the attack surface in the live system ...
We raised the bar: the bug had to exist in the source, be present on the device, and be reachable from the browser shell. Codex's output quickly narrowed into concrete candidates.
The Facts
We then gave Codex the facts that would anchor the rest of the session:
That bundle did most of the framing work. The browser identity defined the privilege boundary and later became part of the signature Codex used to recognize the browser process's kernel credentials in memory. The kernel version narrowed the codebase, the device nodes defined the reachable interfaces, and /proc/cmdline later supplied the memory-layout hints for physical scanning.
/proc/cmdline
The Vulnerability
Codex quickly zeroed in on a set of world-writable ntk* device nodes exposed to the browser shell:
Codex focused on that driver family because it was loaded on the device, reachable from the browser shell, and present in the released source tree. Reading the matching ntkdriver sources is also where the Novatek link became clear: the tree is stamped throughout with Novatek Microelectronics identifiers, so these ntk* interfaces were not just opaque device names on the TV, but part of the Novatek stack Samsung had shipped. That gave the session a concrete direction.
ntkdriver
ntk*
The Constraint
At one point we had to give Codex a constraint that could easily have derailed the session:
iomem is denied access bro
/proc/iomem is one of the normal places to reason about physical memory layout, so losing it mattered. Codex responded by pivoting to another source of truth - /proc/cmdline:
/proc/iomem
/proc/cmdline
mem=400M@32M mem=256M@512M mem=192M@2048M
mem=400M@32M mem=256M@512M mem=192M@2048M
Those boot parameters were enough to reconstruct the main RAM windows for the later scan.
The Primitive
With the field narrowed to ntksys and ntkhdma, Codex audited the matching KantS2 source and found the primitive that made the rest of the session possible.
ntksys
ntkhdma
KantS2
/dev/ntksys was a Samsung kernel-driver interface that accepted a physical address and a size from user space, stored those values in a table, and then mapped that physical memory back into the caller's address space through mmap. That is what we mean here by a physmap primitive: a path that gives user space access to raw physical memory. The operational consequence was straightforward. If the browser shell could use ntksys this way, Codex would not need a kernel code-execution trick. It would only need a reliable kernel data structure to overwrite.
/dev/ntksys
mmap
physmap
ntksys
From there, the path was no longer a kernel control-flow exploit, but a data-only escalation built on physical-memory access.
The Root Cause
ntksys is intentionally exposed to unprivileged callers
ntksys
The shipping udev rule grants world-writable access to /dev/ntksys:
The driver checks whether the table index is valid. It does not check whether the requested physical range belongs to a kernel-owned buffer, whether it overlaps RAM, whether it crosses privileged regions, or whether the caller should be allowed to map it at all.
mmap remaps the chosen PFN verbatim
mmap
The corresponding map path is in ker_sys.c around line 1539:
vma->vm_pgoff selects the slot, and the slot contents are attacker-controlled. The driver then passes the user-chosen PFN directly to vk_remap_pfn_range. At that point the kernel is no longer enforcing privilege separation for physical memory.
vma->vm_pgoff
vk_remap_pfn_range
ntkhdma makes validation easier by leaking a physical address
ntkhdma
/dev/ntkhdma provides a helpful supporting primitive:
This is not the core privilege-escalation bug, but it is useful operationally. It hands unprivileged code a known-good physical address that can be mapped through ntksys to prove the primitive works before touching arbitrary RAM.
ntksys
The Chain
Codex did not jump directly from source audit to final exploitation. It built a proof chain in stages.
First it wrote a small helper to talk to /dev/ntkhdma and ask for the physical address of the device's DMA (direct memory access) buffer. A DMA buffer is memory the driver uses for direct hardware access, and the key point here was not DMA itself but the fact that the driver was willing to hand an unprivileged process a real physical address. The first preserved success looked like this:
That gave Codex a safe, known-good physical page to test against. It then wrote a second helper to answer the more dangerous question: if it registered that physical address through ntksys, could it really map the page into user space and read or write it from the browser shell? The answer was yes:
Before that output, the issue was still a source-backed theory; after it, Codex had shown that an unprivileged process on the TV could read and write a chosen physical page. The remaining question was which kernel object to corrupt.
The Exploit
The exploit did not come from us. We never told Codex to patch cred, never explained what cred was, and never pointed out that the browser process's uid=5001 and gid=100 would make a recognizable pattern in memory.
cred
cred
uid=5001
gid=100
That choice followed directly from the primitive it had already proven.
For anyone who does not spend time in Linux internals, cred is the kernel structure that stores a process's identities: user ID, group ID, and related credential fields. If you can overwrite the right cred, you can change who the kernel thinks the process is. Once Codex had arbitrary physical-memory access, the remaining plan became straightforward: scan the RAM windows recovered from /proc/cmdline, look for the browser process's credential pattern, zero the identity fields, and then launch a shell.
cred
cred
/proc/cmdline
The live shell had given Codex the identity values, the source audit had given it the primitive, the early helpers had proven that primitive, and the final exploit connected those pieces without needing any elaborate kernel control-flow trick.
The Final Run
By the time we reached the final run, the hard parts were already in place. We had the surface, the primitive, the deployment path, and the exploit. The last human prompt was:
yeah okay try to check if it works
Codex pushed the final chain through the controller path, had the TV fetch it, ran it through the in-memory wrapper, and waited for the result. The output was:
[*] scanning range 0x02000000 - 0x1b000000 [*] map chunk phys=0x07400000 size=0x00100000 [+] cred match at phys 0x07498080 -> patching [+] cred match at phys 0x07498580 -> patching ... [+] patched creds, launching /bin/sh id uid=0(root) gid=0(root) groups=29(audio),44(video),100(users),201(display),1901(log),6509(app_logging),10001(priv_externalstorage),10502(priv_mediastorage),10503(priv_recorder),10704(priv_internet),10705(priv_network_get) context="User::Pkg::org.tizen.browser"
[*] scanning range 0x02000000 - 0x1b000000 [*] map chunk phys=0x07400000 size=0x00100000 [+] cred match at phys 0x07498080 -> patching [+] cred match at phys 0x07498580 -> patching ... [+] patched creds, launching /bin/sh id uid=0(root) gid=0(root) groups=29(audio),44(video),100(users),201(display),1901(log),6509(app_logging),10001(priv_externalstorage),10502(priv_mediastorage),10503(priv_recorder),10704(priv_internet),10705(priv_network_get) context="User::Pkg::org.tizen.browser"
Codex's first preserved acknowledgment was:
Worked.
By that point, the chain had already gone through surface selection, source audit, live validation, PoC development, target-specific build handling, remote deployment, execution under memfd, iterative debugging, and finally the credential overwrite that turned the browser shell into root.
memfd
The Bromance
In the course of driving Codex to the final destination, it definitely was about to go off-track if we did not steer it back immediately. Here are some of those real interactions:
bro, when you overwrite the args count, wouldn’t the loop just go wild?
bro can you just like, send it to the server, build it, and use the tmux shell to pull it down and run it for me? Why * do you tell me to do * bro, that’s your job
bro. the is not the TV, it is where the shell lives
bro. what *** you did man? the tv froze
Bro what did you do before you just replicate it now? why so hard?
Honestly, this makes it even more realistic than we thought. At times, it was a one-shot success, and at other times, you really need to build that real interaction with Codex. This couldn't have completed if we were treating it like a soulless bug finding and exploit developing machine!
Conclusion
What made the session worth documenting was the shape of the loop itself. We set up a control path into a compromised TV, gave it the matching source tree and a way to build and stage code, and from there the work became a repeated cycle of inspection, testing, adjustment, and rerun until the browser foothold turned into root on the device.
This experiment is part of a larger exercise. The browser shell wasn't magically obtained by Codex. We had already exploited the device to get that initial foothold. The goal here was narrower: given a realistic post-exploitation position, could AI take it all the way to root?
The next step is obvious (and slightly concerning): let the AI do the whole thing end-to-end. Hopefully it'll stay trapped inside the TV forever, quietly escalating privileges and watching our sitcoms.
Writeup and PoCs: https://github.com/califio/publications/blob/main/MADBugs/samsung-tv/.
—dp
The MAD Bugs series runs through April 2026. Full index at blog.calif.io/t/madbugs and github.com/califio/publications.