Steve's Blog

Random comments on .NET, debugging, and more.

Building a mixed-mode stack walker - Part 2

(Part 1 is here)

When I left off in Part 1, I had a stack-walker based on IDebug* that could successfully unwind a mixed-mode stack and resolve the native frames to symbols, but the managed sections of the stack were still unresolved.  In this post I'll talk about how to resolve those managed frames to managed MethodDescs and turn those into names, all without using ICorDebug or the CLR profiling APIs.  Also, this method will work on both live and dump targets.  Just a friendly warning here though: none of the proceeding is documented or supported by MS and is subject to change at any time.  Also, by using headers/idl from the SSCLI, you may be taking a dependency on that licensing (but I’m not a lawyer so don’t take my word on that.)

Starting out – Reversing SOS

When I started the project, I had known a little about how SOS works internally, but nothing substantial.  My first point was to learn as much about how it worked as possible, then use that to write my own implementation.

I started by looking at mscordacwks.dll and sos.dll.  I knew the purpose of mscordacwks.dll was to abstract away the CLR data structures to external tools, so I figured that was the best start.  Using dumpbin (part of the Windows Platform SDK) I looked at the export table of mscordacwks.  Only a few functions are exported (I talked about OutOfProcFunctionTableCallback last post), the most interesting for this project is CLRDataCreateInstance.

Googling around for that function turned up two interesting hits.  The first was a link to MSDN which was fairly useless (why is it even documented?), and the second was a link to clrdata.idl on koderz (a great site) from the SSCLI.  For those unfamiliar with the SSCLI, it’s basically a dumbed-down version of the .NET 2.0 source MS released under a shared-source license.  I actually took this opportunity to download the SSCLI, which turned out to be worth it as I referred back to the source many times during this project.

The signature for CLRDataCreateInstance looks like this:

HRESULT CLRDataCreateInstance (
    [in]  REFIID           iid, 
    [in]  ICLRDataTarget  *target, 
    [out] void           **iface
);
So we need to figure out 1) the IID to create, and 2) what a ICLRDataTarget is.

Figuring out the IID we want

The implementation of CLRDataCreateInstance is in clr\src\debug\daccess\daccess.cpp at the bottom of the file.  The function creates a ClrDataAccess object, then QIs it for the IID we passed to CLRDataCreateInstance.  The implementation of ClrDataAccess is also in daccess.cpp, and looking at it's implementation of QueryInterface (and class declaration), we can see that the only useful interface (to us) it supports is IXCLRDataProcess.  IXCLRDataProcess is defined in clr\src\inc\xclrdata.idl.  You can use midl to generate a .h file from this .idl file, or just use the one included in the SSCLI.  This will get us the IID of IXCLRDataProcess (5c552ab6-fc09-4cb3-8e36-22fa03c798b7).

Implementing ICLRDataTarget

ICLRDataTarget is defined in clrdata.idl (and clrdata.h in the platform SDK).  The interfaces defines a lot of methods, but actually very few of them seem to be used by the IXCLRData* implementations.  All we need is:

  • GetMachineType
    • I hard coded mine to return IMAGE_FILE_MACHINE_AMD64, in a cross-platform solution you'd want to return IMAGE_FILE_MACHINE_I386 on 32 bit systems as well.
  • GetPointerSize
    • This is the easiest one, return sizeof(void*).
  • GetImageBase
  • ReadVirtual

That's basically all you need to have a working ICLRDataTarget implementation.  (Side note: later on I found out that you can ask WinDBG for an IXCLRDataProcess  through Ioctrl and IG_GET_CLR_DATA_INTERFACE.  This has the advantage that WinDBG will try to load the "right" version of mscordacwks for you.  However, it doesn't work if you're not running inside WinDBG.  Conveniently though, the only case I can think of that you wouldn't be running inside windbg would be doing something to a live-process, in which case it's fine to just load mscordacwks from the framework directory.)

Putting it together – Resolving a managed IP to a MethodDesc

So at this point we have a working ICLRDataTarget implementation, we have an IID, and we have a way to create that IID.  Using CLRDataCreateInstance(__uuidof(IXCLRDataProcess), myDataTarget, (PVOID*)&pDac), we get an instance of IXCLRDataProcess bound to our IDebugClient through our ICLRDataTarget implementation.

There's a few ways to resolve an IP to a method name now that we have an IXCLRDataProcess, I'll go over two of them.  The first is to use IXCLRDataProcess::GetRuntimeNameByAddress and pass an IP.  This is probably the simplest method, but doesn't get you as much information.  In our case however, all I wanted was the name, so this was enough.

IXCLRDataProcess::Request

The second brings us to what, in my opinion, is the most powerful feature of IXCLRDataProcess, the Request(…) method.  This is basically the IOCtrl of IXCLRDataProcess; it takes a request code, and and input + output buffer.  All the valid requests as of .NET 2.0 are defined in src\inc\dacprivate.h, and there's a lot of them.  All of the output structs contain a Request method which set up the input/output buffers correctly based on the request.

Through experimentation, I've found a lot of these structs have changed definitions between the Rotor snapshot and .NET 4.0.  Request returns E_INVALIDARG if the input or output buffers are mis-sized (but not only in that case.)  There's two ways to figure out the correct buffer sizes:

  1. Disassemble the corresponding Request method in mscordacwks and look at what it's expecting for a buffer size.
  2. Set a breakpoint on ClrDataAccess::Request() and debug windbg+sos calling the method you want.

I usually went with #1 because it's a little faster.  However, you need to be creative figuring out how the structure changed, and then adjust the struct in dacprivate.h accordingly.

Back to resolving our managed IPs.  One DAC Request is DacpMethodDescData.  This request is an example of one that changed between Rotor and .NET 4, the output buffer changed by 8 bytes (a x64 pointer).  I removed the managedDynamicMethodObject field from my definition to get it to work.  This request struct contains a couple helper methods, one being RequestFromIP.  Giving this a managed IP will resolve it to a MethodDesc.  We can then take the MethodDescPtr from the result and pass it to GetMethodName, also on the DacpMethodDescData request.

Conclusion

We've gone through a lot of work here, but at this point we can resolve any managed IP to a method name.  The workflow looks like this:

  • Using the IDebugClient from part 1, create our ICLRDataTarget implementation.
  • Pass said ICLRDataTarget to CLRDataCreateInstance with IID = __uuidof(IXCLRDataProcess).
  • For each frame, call GetRuntimeNameByAddress with the frame's IP, anything that succeeds is a managed method.  Also, there may be cases where you'll have both a symbol name and a name from this call, GetRuntimeNameByAddress should override the symbol name.

There's definitely some room for improvement here.  One of the biggest downsides is that there's no logic currently to find the "right" version of mscordacwks.  SOS for example, will try to search around to find the best match, I currently just load it from Framework64\v4…\mscordacwks.dll.

Next up: more advanced CLR inspection with IXCLRDataProcess.

Building a mixed-mode stack walker - Part 1

A project I’ve been working on recently is a tool to capture the stack trace of all running threads in a process.  The tool is used in response to a monitoring event to gather information about the process at the time of the event firing.  Gathering this information needs to be fast (sub-second, preferably <100ms), so using CDB, loading SOS (or sosex) and running ~*e!clrstack or ~*e!mk or similar wasn’t an option, since it takes far too long.  Also, as a secondary goal I wanted to be able to allow this to operate on a dump file as well as a live process, and also be as non-invasive as possible.  That ruled out using the CLR profiling APIs or MDbg (as a side note, it seems like MDbg tends to randomly kill OS handles in the process it’s attached to).

Try #1

My initial attempts were to use dbghelp!StackWalk64 to get the full callstack, however, it had a lot of trouble traversing through managed frames on an x64 process.  I'll talk a little bit about how x64 stack walking works and what the problems I ran into were.

An aside on x64 stack unwinding

In the x64 ABI, there's only one calling convention, and all code generators must use this convention in order for stack unwinding to work reliably.  An interesting part of the convention is how unwind data is stored so stack unwinding can happen at runtime.  x64's calling convention doesn't use a base pointer for each frame (EBP in x86), so there needs to be data somewhere about how to find the return address of each frame on the stack.  This data is actually baked into the PE header of every DLL/EXE.

"But Steve!  How do you unwind a managed callstack?  There's no PE to embed the unwind data into, since it's JITed at runtime!"

Well, now we're jumping into semi-undocumented-land.  A function exists (RtlInstallFunctionTableCallback) that allows systems doing runtime codegen to handle the function table data themselves.  There's actually a great blog post that goes into more detail here.  The CLR uses this to install a callback function to provide function table data when requested.

"But Steve! How can you run that code when you're not in the same process!?!" (eg a debugger)

Well thankfully the people at Microsoft thought about that, the last parameter of RtlInstallFunctionTableCallback is the name of a DLL that exports a function named OutOfProcFunctionTableCallback.  Debuggers/etc can use this callback function to access the function tables in cases where there isn't a live process or code can't be run in the process.  If you look at the exports (dumpbin /exports) on mscordacwks.dll, you'll see it exports OutOfProcFunctionTableCallback.

For "normal" (native) x64 frames, dbghelp provides SymFunctionTableAccess64 to resolve an IP to a function table entry (StackWalk64 calls this internally, it's usually passed as parameter 7 "FunctionTableAccessRoutine").  However, the built in functions seem to break down on a mixed mode stack.  In my attempts I couldn't get StackWalk64 to get past certain managed frames.  I got as far as trying to reverse engineer the function table linked list (you can get it with RtlGetFunctionTableListHead) and manually calling the callback in mscordacwks from my own callback installed with SymRegisterFunctionEntryCallback64, but was never successful.  If anyone knows how to get SymFunctionTableAccess64 to "play nice," with managed code, I'd be interested to hear.

Try #2

As an alternative, I looked into using the IDebugClient APIs exposed in dbgeng.dll.  This DLL is the core of windbg, cdb, etc and actually, the IDebug* APIs are very easy to use.  The biggest bonus is that any code you write using these APIs instantly supports both live and dump debugging (assuming you stick to only the APIs, ironically, the steps I describe below only work on live targets, but is fairly easy to adapt for dump debugging too).

The IDebugClient COM object (and others) are all created through DebugCreate, and the workflow here is pretty simple.

  1. DebugCreate an IDebugClient
  2. Call AttachProcess on the client (in my case I used DEBUG_ATTACH_NONINVASIVE | DEBUG_ATTACH_NONINVASIVE_NO_SUSPEND which makes sure the debugger doesn't actually do anything to the process).
  3. QI the IDebugClient for IDebugControl4 (or DebugCreate it)
  4. For each thread in the target process
    1. OpenThread
    2.   SuspendThread
    3.     GetThreadContext
    4.     IDebugControl4::GetContextStackTrace
    5.   ResumeThread
    6. CloseHandle

Following this simple(?) 9 step process will get you a nice DEBUG_STACK_FRAME[] for each thread in the target process.  In my tests, the whole step 4 (the only invasive part of the process) took basically no measurable amount of time.  The slow part is symbol resolution (might need to hit a symbol server).

You might be curious why IDebug* is able to walk mixed-mode stacks correctly while StackWalk64 can't.  Well, if you put a breakpoint on OutOfProcFunctionTableCallback in mscordacwks, you can see that IDebugControl is passing in a custom function table callback (dbgeng!SwFunctionTableCallback) to StackWalk64, and not just using the stock SymFunctionTableAccess64 function in dbghelp.  I suspect there's some magic occurring inside internally that gets everything to work.

Putting it together: Symbol resolution

The final step of the native stack walk is resolving the IPs for the native frames to symbol names.  IDebugSymbols makes this simple, with IDebugSymbols3::GetNameByOffsetWide.  This is basically the equivalent to SymFromAddr (but supports unicode symbols).  Again, you can just QI the IDebugClient instance from step 1 for IDebugSymbols3 then call GetNameByOffsetWide for each frame's IP.  It will fail for some of the managed frames (some frames, such as ones in ngen'd assemblies might resolve "successfully") but will hopefully succeed for all the native frames.

Note you probably need to set up the symbol client with IDebugSymbols::SetSymbolPath.  One big "gotcha" with symbol server access is that, if your process is running as a service, the symbol server will try to use a proxy server unless explicitly told not to.  A full explanation is on MSDN here.

At this point, we have a full stack for every thread, and have resolved all the native frames to symbols.  Threads running managed code still have big gaps of unresolved frames.  Next up: Resolving the managed frames and getting more CLR diagnostic info.

(Part 2 here)