Problems while switching C stacks (fibers)

  • From: Konstantin Olkhovskiy <lupus@xxxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Mon, 6 May 2013 14:38:33 +0400

Hey,

I'm trying to integrate my fiber library (based on libcoro) with luajit.
Fiber creation is done via pure Lua/C api via trampoline C fiber that does
roughly the following:
- calls lua_newthread to create a new ``thread'' for the fiber
- determines parent lua thread (if any, or root lua state)
- performs lua_xmove for 2 values from parent state to thread state
(function and a single argument)
- does a lua_pcall on the thread's state

All of the above is performed inside a C fiber on a separate stack.

I've created a small tcp echo server with two handler fibers that look like
this:

function handler(link)
local g
while link:wait_read() do
g = link:read()
link:write(g)
fbr.log_d"Handler 1 processed a g"
if nil == g then
fbr.log_d"Handler 1 got nil, exiting"
return
end
end
end

function handler2(link)
local g
while link:wait_read() do
g = link:read()
link:write(g)
fbr.log_d"Handler 2 processed a g"
if nil == g then
fbr.log_d"Handler 2 got nil, exiting"
return
end
end
end

They are linked in a ``chain'', and a ``link'' object is passed to each
fiber. This object can be read from and written to. Basically these fibers
just push their input forward without any processing. Link's wait_read and
read methods can yield execution to some other C (or lua) fiber.

It looks like this in lua (chain_set is also done in Lua/C api so as to
accept both FFI function pointers and plain lua functions):

server:chain_set{handler, handler2}

It crashes somewhere in the luajit with the following meaningless backtrace
(though frame #2 looks suspicious):
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6f8f325 in ?? () from /usr/lib/libluajit-5.1.so.2
(gdb) bt
#0  0x00007ffff6f8f325 in ?? () from /usr/lib/libluajit-5.1.so.2
#1  0x00007fffc81dff04 in ?? ()
#2  0x0000000000000000 in ?? ()

But if i move ``local g'' inside the while loop, it works fine. Other
workaround is removing second lua handler, which shows stable work with any
scope of local variable ``g''.

Is there any explanation for such a behaviour? I believe that i have
corrupted the C stack somewhere along the way, but if i use pure C fibers
in a chain (even the chain itself being confiugured in lua), it works fine
and does not crash.

Are there any caveats on using separate C stacks with separate lua
``threads''?

-- 
Regards,
Konstantin

Other related posts: