Intermittent hangs in an ASP.NET application when serving asmx files via HTTP GET

by steve 6. July 2011 23:03

I was recently debugging an intermittent, brief hang (5-15 seconds) in an ASP.NET application that would occur once every few days.  The application was fully precompiled (non-updatable) and running .NET 4.0.

When the hang occurred, most threads would be sitting in a stack similar to this: (method signatures slightly modified for visibility reasons)

System.Web.Compilation.BuildManager.GetBuildResultFromCacheInternal
System.Web.Compilation.BuildManager.GetVPathBuildResultFromCacheInternal
System.Web.Compilation.BuildManager.GetVPathBuildResultInternal
System.Web.Compilation.BuildManager.GetVPathBuildResultWithNoAssert
System.Web.Compilation.BuildManager.GetVirtualPathObjectFactory
System.Web.Compilation.BuildManager.CreateInstanceFromVirtualPath
System.Web.UI.PageHandlerFactory.GetHandlerHelper
<snip>

With one thread in a stack like this:

System.Threading.WaitHandle.InternalWaitOne
System.CodeDom.Compiler.Executor.ExecWaitWithCaptureUnimpersonated
System.CodeDom.Compiler.Executor.ExecWaitWithCapture
Microsoft.CSharp.CSharpCodeGenerator.Compile
Microsoft.CSharp.CSharpCodeGenerator.FromFileBatch
Microsoft.CSharp.CSharpCodeGenerator.CompileAssemblyFromFileBatch
System.Web.Compilation.AssemblyBuilder.Compile
System.Web.Compilation.BuildProvidersCompiler.PerformBuild
System.Web.Compilation.BuildManager.CompileWebFile
System.Web.Compilation.BuildManager.GetVPathBuildResultInternal
System.Web.Compilation.BuildManager.GetVPathBuildResultWithNoAssert
System.Web.Compilation.BuildManager.GetVPathBuildResult
System.Web.UI.PageParser.GetCompiledPageInstance
System.Web.UI.PageParser.GetCompiledPageInstance
System.Web.Services.Protocols.DocumentationServerProtocol.GetCompiledPageInstance
System.Web.Services.Protocols.DocumentationServerProtocol.Initialize
System.Web.Services.Protocols.ServerProtocolFactory.Create
System.Web.Services.Protocols.WebServiceHandlerFactory.CoreGetHandler
System.Web.Services.Protocols.WebServiceHandlerFactory.GetHandler
System.Web.Script.Services.ScriptHandlerFactory.GetHandler
<snip>

(note, I didn't have parameters to these stack traces, had I, the diagnostic would have been much simpler in hindsight)

Every time this happened it'd be an ASMX file triggering the compile, never any other type.  I was pretty perplexed by this because:

  1. The application is fully precompiled, so ASP.NET should never attempt to compile anything.
  2. There's not really anything to compile in our ASMX files (they're just pointers to a class in a DLL), so even if ASP.NET did try to compile it, it shouldn't take any time.

After a few days of continuing to be perplexed, I had a breakthrough.

In ASP.NET, if you just hit an ASMX file via a browser, you get a pretty web page that lists all the methods on the service and allows you to invoke methods through the browser.  This page is actually an ASPX file (named DefaultWsdlHelpGenerator.aspx) that lives in the Config directory of whatever framework you're running (in my case it was C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Config, ymmv).  Bingo, that's what's getting compiled!

Once I figured this out, the solution was fairly simple.  ASP.NET allows you to specify the WSDL help generator file via system.web/webServices/wsdlHelpGenerator in your web.config.  I simply copied DefaultWsdlHelpGenerator.aspx into my application's root directory and added the wsdlHelpGenerator element to my web.config.  Now, when my application needs to generate a WSDL file or page, ASP.NET uses the precompiled help generator instead of the default one and doesn't trigger a compile.  Problem solved!

The moral of the story here: if you have a precompiled site with asmx services and care about short hangs, do what I outlined above.  If you don't precompile, this is the least of your worries.

Tags:

debugging | asp.net

SPT - x86 Release

by steve 24. June 2011 17:01

Read more about SPT here

I've compiled the latest build of SPT for x86.  All functionality should be working fine, but since I don't do much x86 debugging it hasn't been as thoroughly tested as the x64 build.  Let me know if you run into any issues.

This is actually a later build than the x86 version and has some enhanced features, like !findtimers to dump all the active timer objects in the process and enhanced output for some commands.  I'll get the x64 version update posted soon as well.

Download it here

Tags:

clr | debugging | x86 | windbg

SPT - A WinDBG extension for debugging .NET applications

by steve 7. June 2011 00:36

In my spare time I've been working on a WinDBG extension containing some useful methods that automate pretty common .NET debugging tasks for me, and hopefully other people as well.  The extension right now targets .NET 4/x64 only, as that's my primary development platform.  I'll probably make a x86 build soon (especially if people request it), it shouldn't be very difficult to make it work on 32 bit platforms.  Currently a few commands support DML (!DumpSqlConnectionPools is the most noticeable), I'll be adding support for DML to other commands soon.

A few of the commands are targeted at ASP.NET/server apps (!DumpHttpContext, !DumpASPNetRequests, !DumpSqlConnectionPools, !DumpThreadPool for example), but most are fairly general purpose.  In the next few blog posts I'll go into details about the implementation of each of the commands, but for now here's a list of them with quick summaries and examples.  Feel free to comment/email me with any comments/suggestions.

Download the extension here (x64)

And the x86 version here

 

ASP.NET

!DumpASPNetRequests

Prints out all threads with a HTTP context.
Example output:

ThreadID  HttpContext       StartTime             URL + QS 
0123      0000000002f62b38  05/29/2011 18:23:02   /hello/world.aspx  
0124      0000000002f62c98  05/29/2011 18:23:05   /test.aspx?w00t=1
...
 

!FindHttpContext

Prints the context information for the current thread, including the managed thread object address (System.Threading.Thread), ExecutionContext, and Logical/IllogicalCallContext.
Example output:

OS ThreadId: e64, N: 0
Managed Thread @ 0000000002f58580

Thread ExecutionContext @ 0000000002f64af8
IllogicalCallContext @ 0000000002f64c40
HostContext (HttpContext) @ 0000000002f64d80

Note: of all the commands here, this is the most likely to break from service releases.  It relies on reversed offsets to find the managed thread object from the unmanaged thread object.

Hashtables/Dictionaries/Etc

!DumpDictionary (!DumpHashtable, !ddct, !dht)

Note: I used hashtable/dictionary pretty interchangeably for the rest of this post, anything that supports Hashtables also supports Dictionaries.

Iterates over all the key/value pairs in a dictionary, printing them out, and translating keys/values to strings if possible.
Example:

0:000> !ddct 0000000002f621c0  
entry at: 0000000002f622e8
key: 0000000000000001 [(null), 1]
value: 0000000002f62128 [System.String, "test"]
hashCode: 0x00000001

 

entry at: 0000000002f62300
key: 0000000000000002 [(null), 2]
value: 0000000002f62150 [System.String, "two"]
hashCode: 0x00000002

These methods have a few switches you can use.

  • The first is -short, when used prints only the key/value/hash tripplet for each entry, useful for feeding into a script.
  • The second is -kdo and -vdo.  These switches allow you to pass a tokenized command in for the key/value in each entry.  For instance, say you want to print out all the values in the dictionary, you could run

    !ddct -vdo "!do %p" 0000000002f621c0

These switches are one of my favorite features and have saved me a ton of time already.

Note: DumpDictionary and DumpHashtable are both actually the same function, the export table links them together.

Note 2: This also supports the HybridDictionary class, just use any of the above functions on it.

 

!DumpMemoryCache (!dmc)

Basically the same as !DumpDictionary, but goes through all the sub-hashtables inside the memory cache object.  Doesn't support the same switches as !DumpDictionary yet though.

 

Other misc stuff

!FullStack (!fs)

Prints out the mixed-mode callstack of the current thread.
Example:  (parameters snipped by me so it doesn't wrap on my blog)

       IP        M  Name 

000000007745153a 0 ntdll!ZwRequestWaitReplyPort+a
0000000076d42f58 0 KERNEL32!ConsoleClientCallServer+54
0000000076d75321 0 KERNEL32!ReadConsoleInternal+1f1
0000000076d8a762 0 KERNEL32!ReadConsoleA+b2
0000000076d58e04 0 KERNEL32!TlsGetValue+899e
000007feef4f17c7 0 clr!DoNDirectCall__PatchGetThreadCall+7b
000007feec6834a1 1 DomainNeutralILStubClass.IL_STUB_PInvoke(...)+101
000007feece2f58a 1 System.IO.__ConsoleStream.ReadFileNative(...)+ba
000007feece2f3f2 1 System.IO.__ConsoleStream.Read(...)+62

(Note: I've seen some version of dbghelp have problems resolving native methods in CLR.dll to symbols.  The latest version seems to work fine though.)

 

!FormatDT

Given a System.DateTime or TimeSpan, formats it as a string MM/dd/yyyy hh:mi:ss.  I think psscor2 had something similar back in the day.

 

!EvalExpression (!evalexpr)

Given an expression consisting of field access or hashtable lookups, evaluates the expression starting at a root object.

Fields are separated by periods just like C++/C#.  Hashtable lookups are specified by the [ ] operator (like in C#) and are done in two ways.

  • The first is if the hashtable/dictionary key is a string, you can use a quoted string inside the [ ] operator to tell the evaluator you want to match a string.  For example _myHashtable["steve"].lastName will lookup the value in _myHashtable with the key of "steve", then evaluate the lastName field. 
  • The second is by hashcode.  If the expression inside the [ ] operator is an unquoted number, it's used to do a lookup by hashcode.

Also, if the root object itself is a hashtable, it's legal for the expression to begin with simply a [ ] operator.  For example, ["steve"].lastName.

Expressions ending with a "$" sign are evaluated to a string and the result string printed.  If the final result isn't a string object, stuff will probably break.

Note: Hashtable lookups are actually O(n) (you'd need a full-out IL interpreter like in VS to do it in O(1)), so they might get slow on really big dictionaries.

 

!PrintDelegateMethod (!pdm)

Attempts to resolve a delegate to a method name.  This should work at least for non-multicast delegates (eg delegates without an invocation list) to static or instance methods.  For instance method calls, the value in _methodPtr is simply a pointer to the managed method and resolved in a way similar to ip2md.  For static methods, the value in _methodPtrAux is an entry into the JIT stub table, and used to get a MethodDesc handle to the method, which is then resolved to a name.

 

!DumpThreadPool

Dumps the items currently in the .NET 4.0 ThreadPool queues, resolving work items to method names.  This currently supports work items enqueued via ThreadPool.QueueUserWorkItem and Task[<T>].  These are the only two classes in .NET 4 that currently implement IThreadPoolWorkItem.  This method only works if ThreadPoolGlobals.useNewWorkerQueue = true.

There's a also special case: if a work item was created via BeginInvoke, DumpThreadPool will attempt to resolve it to the real method being called by looking at the MethodCall object's MethodInfo field.  For this to work, the method being called must be System.Runtime.Remoting.Proxies.AgileAsyncWorkerItem.ThreadPoolCallBack, and the callback argument must be a System.Runtime.Remoting.Proxies.AgileAsyncWorkerItem.

 

!DumpSqlConnectionPools (!dsql)

Dumps out all SQL connection pools in all app domains of the process.

Example output:

0:046>  !dumpsqlconnectionpools;
[0] Factory @ 00000001df0614c8
Connection String: <snip>
PoolGroup:    000000011f0877a8
    SID:  S-1-5-21-3430578067-4192788304-1690859819-9294
    Pool: 00000001bf077d08, State: 1, Waiters: 0, Size: 12
    Connections:
              ConnPtr           State  #Async  Created            Command/Reader    Timeout  Text
        [000] 00000001bf0788e8      1       0  6/10/2011 17:30:32
        [001] 000000015f1ebf80      1       0  6/10/2011 17:30:33
        [002] 000000013f30da08      1       0  6/10/2011 17:30:39
        [003] 000000013f3315e0      1       0  6/10/2011 17:30:54
        [004] 000000011f4852f8      1       0  6/10/2011 17:30:54
        [005] 00000001df1b5998      1       0  6/10/2011 17:30:54
        [006] 00000000ff41da90      1       0  6/10/2011 17:30:54
        [007] 00000001bf1c08f8      1       0  6/10/2011 17:30:54
        [008] 000000019f5f8080      1       0  6/10/2011 17:30:54
        [009] 000000015f2b4f90      1       0  6/10/2011 17:30:54
        [00a] 000000013f344120      1       0  6/10/2011 17:30:54
        [00b] 000000019f60ab18      1       0  6/10/2011 17:30:54

Connection String: <snip>;
PoolGroup:    000000017f125730
    SID:  S-1-5-21-3430578067-4192788304-1690859819-9294
    Pool: 000000017f4495b8, State: 1, Waiters: 0, Size: 4
    Connections:
              ConnPtr           State  #Async  Created            Command/Reader    Timeout  Text
        [000] 000000017f449f50      1       0  6/10/2011 17:30:54
        [001] 000000017f471280      1       0  6/10/2011 17:30:54
        [002] 00000000ff4305b8      1       0  6/10/2011 17:30:54
        [003] 00000000ff442f50      1       1  6/10/2011 17:30:54  000000017f4372a0       50  do_something_prc

What are we seeing here?  Connection pooling is basically a 5 level hierarchy:

  • Connection String - top level, defines the server and connection options.
    • Pool group - Second level, varies based on the identity used to connect to the DB if using SSPI
      • Pool - Actually contains the SqlConnection instances.
        • SqlConnection - The DB connection instance
          • Active readers - SqlDataReaders that are actively doing something

Some fields that probably need some explaining:

  • Pool -> State
    • Corresponds to the System.Data.ProviderBase.DbConnectionPool+State enum, Initializing, Running, Shutting Down
  • Connection -> State
    • Corresponds to the System.Data.ConnectionState enum.
  • Created is the time the connection was first opened in UTC.
  • Command/Reader
    • If Async = 1, the pointer is to a command, otherwise it's to a SqlDataReader.  With a SqlCommand, you can get to the reader via the _cachedAsyncState field, with a SqlDataReader, you can get to the command via the _command field.

Note: This doesn't currently dump connections in the transacted connection pools.

 

!FindTimers

(x86 only for now)

Dumps out all timers registered in the process.

Example:

0:006> !findtimers
Handle            TimerObj          ADID  Period[ms]  State             Context           Delegate          Method
          293e48  00000000022cbbb4     1        1000  00000000022cbb48  00000000022cbbc8  00000000022cbb6c  SomeCallback(System.Object)

Columns are:

  • Handle
    • The timer handle, can be used to cross reference with an existing TimerBase object
  • TimerObj
    • A reference to the managed _TimerCallback object
  • ADID
    • The appdomain ID the timer is associated with
  • State
    • The state object associated with the constructor.
  • Context
    • The ExecutionContext captured by the timer.
  • Delegate
    • The delegate that will be called when the timer fires.

!SPTHelp

Prints out the built-in help.

Tags:

clr | debugging | windbg | x64 | x86

Getting the managed System.Threading.Thread instance for a native thread object (ThreadObj)

by steve 28. April 2011 20:41

There are times when I’ve needed to find the managed thread object backing the native object output in !threads.  The way I used to get it was to dump all thread objects (!dumpheap -mt <thread MT>) and match the managed thread Id field with the output from !threads.  However, there’s a better way. 

There’s a method on the native thread object “clr!Thread::GetExposedObjectRaw”, which is a simple 3 instruction method.  Disassembling it we can see that a pointer to the managed thread object is 0x228 bytes into the native thread object.  So, simply, the managed thread object is @ poi(poi(ThreadObj+0x228)).

I assume this is also there in the 2.0 CLR but I haven’t looked.  Also, this offset is for the 64 bit CLR, on a 32 bit system it’s at ThreadObj+0x160.  Obviously these offsets are subject to change with any CLR patch, so don’t depend on them in any production code unless you understand the risks.

Tags:

clr | debugging | sos | x64 | x86

Building a mixed-mode stack walker - Part 2

by steve 28. April 2011 20:40

(Part 1 is here)

When I left off in Part 1, I had a stack-walker based on IDebug* that could successfully unwind a mixed-mode stack and resolve the native frames to symbols, but the managed sections of the stack were still unresolved.  In this post I'll talk about how to resolve those managed frames to managed MethodDescs and turn those into names, all without using ICorDebug or the CLR profiling APIs.  Also, this method will work on both live and dump targets.  Just a friendly warning here though: none of the proceeding is documented or supported by MS and is subject to change at any time.  Also, by using headers/idl from the SSCLI, you may be taking a dependency on that licensing (but I’m not a lawyer so don’t take my word on that.)

Starting out – Reversing SOS

When I started the project, I had known a little about how SOS works internally, but nothing substantial.  My first point was to learn as much about how it worked as possible, then use that to write my own implementation.

I started by looking at mscordacwks.dll and sos.dll.  I knew the purpose of mscordacwks.dll was to abstract away the CLR data structures to external tools, so I figured that was the best start.  Using dumpbin (part of the Windows Platform SDK) I looked at the export table of mscordacwks.  Only a few functions are exported (I talked about OutOfProcFunctionTableCallback last post), the most interesting for this project is CLRDataCreateInstance.

Googling around for that function turned up two interesting hits.  The first was a link to MSDN which was fairly useless (why is it even documented?), and the second was a link to clrdata.idl on koderz (a great site) from the SSCLI.  For those unfamiliar with the SSCLI, it’s basically a dumbed-down version of the .NET 2.0 source MS released under a shared-source license.  I actually took this opportunity to download the SSCLI, which turned out to be worth it as I referred back to the source many times during this project.

The signature for CLRDataCreateInstance looks like this:

HRESULT CLRDataCreateInstance (
    [in]  REFIID           iid, 
    [in]  ICLRDataTarget  *target, 
    [out] void           **iface
);
So we need to figure out 1) the IID to create, and 2) what a ICLRDataTarget is.

Figuring out the IID we want

The implementation of CLRDataCreateInstance is in clr\src\debug\daccess\daccess.cpp at the bottom of the file.  The function creates a ClrDataAccess object, then QIs it for the IID we passed to CLRDataCreateInstance.  The implementation of ClrDataAccess is also in daccess.cpp, and looking at it's implementation of QueryInterface (and class declaration), we can see that the only useful interface (to us) it supports is IXCLRDataProcess.  IXCLRDataProcess is defined in clr\src\inc\xclrdata.idl.  You can use midl to generate a .h file from this .idl file, or just use the one included in the SSCLI.  This will get us the IID of IXCLRDataProcess (5c552ab6-fc09-4cb3-8e36-22fa03c798b7).

Implementing ICLRDataTarget

ICLRDataTarget is defined in clrdata.idl (and clrdata.h in the platform SDK).  The interfaces defines a lot of methods, but actually very few of them seem to be used by the IXCLRData* implementations.  All we need is:

  • GetMachineType
    • I hard coded mine to return IMAGE_FILE_MACHINE_AMD64, in a cross-platform solution you'd want to return IMAGE_FILE_MACHINE_I386 on 32 bit systems as well.
  • GetPointerSize
    • This is the easiest one, return sizeof(void*).
  • GetImageBase
  • ReadVirtual

That's basically all you need to have a working ICLRDataTarget implementation.  (Side note: later on I found out that you can ask WinDBG for an IXCLRDataProcess  through Ioctrl and IG_GET_CLR_DATA_INTERFACE.  This has the advantage that WinDBG will try to load the "right" version of mscordacwks for you.  However, it doesn't work if you're not running inside WinDBG.  Conveniently though, the only case I can think of that you wouldn't be running inside windbg would be doing something to a live-process, in which case it's fine to just load mscordacwks from the framework directory.)

Putting it together – Resolving a managed IP to a MethodDesc

So at this point we have a working ICLRDataTarget implementation, we have an IID, and we have a way to create that IID.  Using CLRDataCreateInstance(__uuidof(IXCLRDataProcess), myDataTarget, (PVOID*)&pDac), we get an instance of IXCLRDataProcess bound to our IDebugClient through our ICLRDataTarget implementation.

There's a few ways to resolve an IP to a method name now that we have an IXCLRDataProcess, I'll go over two of them.  The first is to use IXCLRDataProcess::GetRuntimeNameByAddress and pass an IP.  This is probably the simplest method, but doesn't get you as much information.  In our case however, all I wanted was the name, so this was enough.

IXCLRDataProcess::Request

The second brings us to what, in my opinion, is the most powerful feature of IXCLRDataProcess, the Request(…) method.  This is basically the IOCtrl of IXCLRDataProcess; it takes a request code, and and input + output buffer.  All the valid requests as of .NET 2.0 are defined in src\inc\dacprivate.h, and there's a lot of them.  All of the output structs contain a Request method which set up the input/output buffers correctly based on the request.

Through experimentation, I've found a lot of these structs have changed definitions between the Rotor snapshot and .NET 4.0.  Request returns E_INVALIDARG if the input or output buffers are mis-sized (but not only in that case.)  There's two ways to figure out the correct buffer sizes:

  1. Disassemble the corresponding Request method in mscordacwks and look at what it's expecting for a buffer size.
  2. Set a breakpoint on ClrDataAccess::Request() and debug windbg+sos calling the method you want.

I usually went with #1 because it's a little faster.  However, you need to be creative figuring out how the structure changed, and then adjust the struct in dacprivate.h accordingly.

Back to resolving our managed IPs.  One DAC Request is DacpMethodDescData.  This request is an example of one that changed between Rotor and .NET 4, the output buffer changed by 8 bytes (a x64 pointer).  I removed the managedDynamicMethodObject field from my definition to get it to work.  This request struct contains a couple helper methods, one being RequestFromIP.  Giving this a managed IP will resolve it to a MethodDesc.  We can then take the MethodDescPtr from the result and pass it to GetMethodName, also on the DacpMethodDescData request.

Conclusion

We've gone through a lot of work here, but at this point we can resolve any managed IP to a method name.  The workflow looks like this:

  • Using the IDebugClient from part 1, create our ICLRDataTarget implementation.
  • Pass said ICLRDataTarget to CLRDataCreateInstance with IID = __uuidof(IXCLRDataProcess).
  • For each frame, call GetRuntimeNameByAddress with the frame's IP, anything that succeeds is a managed method.  Also, there may be cases where you'll have both a symbol name and a name from this call, GetRuntimeNameByAddress should override the symbol name.

There's definitely some room for improvement here.  One of the biggest downsides is that there's no logic currently to find the "right" version of mscordacwks.  SOS for example, will try to search around to find the best match, I currently just load it from Framework64\v4…\mscordacwks.dll.

Next up: more advanced CLR inspection with IXCLRDataProcess.

Tags:

clr | DAC | debugging | diagnostics | IDebugClient | IXCLRDataProcess | stack-walk | x64

Building a mixed-mode stack walker - Part 1

by steve 28. April 2011 20:38

A project I’ve been working on recently is a tool to capture the stack trace of all running threads in a process.  The tool is used in response to a monitoring event to gather information about the process at the time of the event firing.  Gathering this information needs to be fast (sub-second, preferably <100ms), so using CDB, loading SOS (or sosex) and running ~*e!clrstack or ~*e!mk or similar wasn’t an option, since it takes far too long.  Also, as a secondary goal I wanted to be able to allow this to operate on a dump file as well as a live process, and also be as non-invasive as possible.  That ruled out using the CLR profiling APIs or MDbg (as a side note, it seems like MDbg tends to randomly kill OS handles in the process it’s attached to).

Try #1

My initial attempts were to use dbghelp!StackWalk64 to get the full callstack, however, it had a lot of trouble traversing through managed frames on an x64 process.  I'll talk a little bit about how x64 stack walking works and what the problems I ran into were.

An aside on x64 stack unwinding

In the x64 ABI, there's only one calling convention, and all code generators must use this convention in order for stack unwinding to work reliably.  An interesting part of the convention is how unwind data is stored so stack unwinding can happen at runtime.  x64's calling convention doesn't use a base pointer for each frame (EBP in x86), so there needs to be data somewhere about how to find the return address of each frame on the stack.  This data is actually baked into the PE header of every DLL/EXE.

"But Steve!  How do you unwind a managed callstack?  There's no PE to embed the unwind data into, since it's JITed at runtime!"

Well, now we're jumping into semi-undocumented-land.  A function exists (RtlInstallFunctionTableCallback) that allows systems doing runtime codegen to handle the function table data themselves.  There's actually a great blog post that goes into more detail here.  The CLR uses this to install a callback function to provide function table data when requested.

"But Steve! How can you run that code when you're not in the same process!?!" (eg a debugger)

Well thankfully the people at Microsoft thought about that, the last parameter of RtlInstallFunctionTableCallback is the name of a DLL that exports a function named OutOfProcFunctionTableCallback.  Debuggers/etc can use this callback function to access the function tables in cases where there isn't a live process or code can't be run in the process.  If you look at the exports (dumpbin /exports) on mscordacwks.dll, you'll see it exports OutOfProcFunctionTableCallback.

For "normal" (native) x64 frames, dbghelp provides SymFunctionTableAccess64 to resolve an IP to a function table entry (StackWalk64 calls this internally, it's usually passed as parameter 7 "FunctionTableAccessRoutine").  However, the built in functions seem to break down on a mixed mode stack.  In my attempts I couldn't get StackWalk64 to get past certain managed frames.  I got as far as trying to reverse engineer the function table linked list (you can get it with RtlGetFunctionTableListHead) and manually calling the callback in mscordacwks from my own callback installed with SymRegisterFunctionEntryCallback64, but was never successful.  If anyone knows how to get SymFunctionTableAccess64 to "play nice," with managed code, I'd be interested to hear.

Try #2

As an alternative, I looked into using the IDebugClient APIs exposed in dbgeng.dll.  This DLL is the core of windbg, cdb, etc and actually, the IDebug* APIs are very easy to use.  The biggest bonus is that any code you write using these APIs instantly supports both live and dump debugging (assuming you stick to only the APIs, ironically, the steps I describe below only work on live targets, but is fairly easy to adapt for dump debugging too).

The IDebugClient COM object (and others) are all created through DebugCreate, and the workflow here is pretty simple.

  1. DebugCreate an IDebugClient
  2. Call AttachProcess on the client (in my case I used DEBUG_ATTACH_NONINVASIVE | DEBUG_ATTACH_NONINVASIVE_NO_SUSPEND which makes sure the debugger doesn't actually do anything to the process).
  3. QI the IDebugClient for IDebugControl4 (or DebugCreate it)
  4. For each thread in the target process
    1. OpenThread
    2.   SuspendThread
    3.     GetThreadContext
    4.     IDebugControl4::GetContextStackTrace
    5.   ResumeThread
    6. CloseHandle

Following this simple(?) 9 step process will get you a nice DEBUG_STACK_FRAME[] for each thread in the target process.  In my tests, the whole step 4 (the only invasive part of the process) took basically no measurable amount of time.  The slow part is symbol resolution (might need to hit a symbol server).

You might be curious why IDebug* is able to walk mixed-mode stacks correctly while StackWalk64 can't.  Well, if you put a breakpoint on OutOfProcFunctionTableCallback in mscordacwks, you can see that IDebugControl is passing in a custom function table callback (dbgeng!SwFunctionTableCallback) to StackWalk64, and not just using the stock SymFunctionTableAccess64 function in dbghelp.  I suspect there's some magic occurring inside internally that gets everything to work.

Putting it together: Symbol resolution

The final step of the native stack walk is resolving the IPs for the native frames to symbol names.  IDebugSymbols makes this simple, with IDebugSymbols3::GetNameByOffsetWide.  This is basically the equivalent to SymFromAddr (but supports unicode symbols).  Again, you can just QI the IDebugClient instance from step 1 for IDebugSymbols3 then call GetNameByOffsetWide for each frame's IP.  It will fail for some of the managed frames (some frames, such as ones in ngen'd assemblies might resolve "successfully") but will hopefully succeed for all the native frames.

Note you probably need to set up the symbol client with IDebugSymbols::SetSymbolPath.  One big "gotcha" with symbol server access is that, if your process is running as a service, the symbol server will try to use a proxy server unless explicitly told not to.  A full explanation is on MSDN here.

At this point, we have a full stack for every thread, and have resolved all the native frames to symbols.  Threads running managed code still have big gaps of unresolved frames.  Next up: Resolving the managed frames and getting more CLR diagnostic info.

(Part 2 here)

Tags:

clr | debugging | diagnostics | IDebugClient | stack-walk | x64

Month List