Git Repo - binutils.git/blame_incremental

... / ...

Commit	Line	Data
	1	\input texinfo @c --texinfo--
	2	@setfilename gprof.info
	3	@c Copyright (C) 1988-2022 Free Software Foundation, Inc.
	4	@settitle GNU gprof
	5	@setchapternewpage odd
	6
	7	@c man begin INCLUDE
	8	@include bfdver.texi
	9	@c man end
	10
	11	@ifnottex
	12	@c This is a dir.info fragment to support semi-automated addition of
	13	@c manuals to an info tree. [email protected] is developing this facility.
	14	@dircategory Software development
	15	@direntry
	16	* gprof: (gprof). Profiling your program's execution
	17	@end direntry
	18	@end ifnottex
	19
	20	@copying
	21	This file documents the gprof profiler of the GNU system.
	22
	23	@c man begin COPYRIGHT
	24	Copyright @copyright{} 1988-2022 Free Software Foundation, Inc.
	25
	26	Permission is granted to copy, distribute and/or modify this document
	27	under the terms of the GNU Free Documentation License, Version 1.3
	28	or any later version published by the Free Software Foundation;
	29	with no Invariant Sections, with no Front-Cover Texts, and with no
	30	Back-Cover Texts. A copy of the license is included in the
	31	section entitled ``GNU Free Documentation License''.
	32
	33	@c man end
	34	@end copying
	35
	36	@finalout
	37	@smallbook
	38
	39	@titlepage
	40	@title GNU gprof
	41	@subtitle The @sc{gnu} Profiler
	42	@ifset VERSION_PACKAGE
	43	@subtitle @value{VERSION_PACKAGE}
	44	@end ifset
	45	@subtitle Version @value{VERSION}
	46	@author Jay Fenlason and Richard Stallman
	47
	48	@page
	49
	50	This manual describes the @sc{gnu} profiler, @code{gprof}, and how you
	51	can use it to determine which parts of a program are taking most of the
	52	execution time. We assume that you know how to write, compile, and
	53	execute programs. @sc{gnu} @code{gprof} was written by Jay Fenlason.
	54	Eric S. Raymond made some minor corrections and additions in 2003.
	55
	56	@vskip 0pt plus 1filll
	57	Copyright @copyright{} 1988-2022 Free Software Foundation, Inc.
	58
	59	Permission is granted to copy, distribute and/or modify this document
	60	under the terms of the GNU Free Documentation License, Version 1.3
	61	or any later version published by the Free Software Foundation;
	62	with no Invariant Sections, with no Front-Cover Texts, and with no
	63	Back-Cover Texts. A copy of the license is included in the
	64	section entitled ``GNU Free Documentation License''.
	65
	66	@end titlepage
	67	@contents
	68
	69	@ifnottex
	70	@node Top
	71	@top Profiling a Program: Where Does It Spend Its Time?
	72
	73	This manual describes the @sc{gnu} profiler, @code{gprof}, and how you
	74	can use it to determine which parts of a program are taking most of the
	75	execution time. We assume that you know how to write, compile, and
	76	execute programs. @sc{gnu} @code{gprof} was written by Jay Fenlason.
	77
	78	This manual is for @code{gprof}
	79	@ifset VERSION_PACKAGE
	80	@value{VERSION_PACKAGE}
	81	@end ifset
	82	version @value{VERSION}.
	83
	84	This document is distributed under the terms of the GNU Free
	85	Documentation License version 1.3. A copy of the license is included
	86	in the section entitled ``GNU Free Documentation License''.
	87
	88	@menu
	89	* Introduction:: What profiling means, and why it is useful.
	90
	91	* Compiling:: How to compile your program for profiling.
	92	* Executing:: Executing your program to generate profile data
	93	* Invoking:: How to run @code{gprof}, and its options
	94
	95	* Output:: Interpreting @code{gprof}'s output
	96
	97	* Inaccuracy:: Potential problems you should be aware of
	98	* How do I?:: Answers to common questions
	99	* Incompatibilities:: (between @sc{gnu} @code{gprof} and Unix @code{gprof}.)
	100	* Details:: Details of how profiling is done
	101	* GNU Free Documentation License:: GNU Free Documentation License
	102	@end menu
	103	@end ifnottex
	104
	105	@node Introduction
	106	@chapter Introduction to Profiling
	107
	108	@ifset man
	109	@c man title gprof display call graph profile data
	110
	111	@smallexample
	112	@c man begin SYNOPSIS
	113	gprof [ -[abcDhilLrsTvwxyz] ] [ -[ABCeEfFJnNOpPqQRStZ][@var{name}] ]
	114	[ -I @var{dirs} ] [ -d[@var{num}] ] [ -k @var{from/to} ]
	115	[ -m @var{min-count} ] [ -R @var{map_file} ] [ -t @var{table-length} ]
	116	[ --[no-]annotated-source[=@var{name}] ]
	117	[ --[no-]exec-counts[=@var{name}] ]
	118	[ --[no-]flat-profile[=@var{name}] ] [ --[no-]graph[=@var{name}] ]
	119	[ --[no-]time=@var{name}] [ --all-lines ] [ --brief ]
	120	[ --debug[=@var{level}] ] [ --function-ordering ]
	121	[ --file-ordering @var{map_file} ] [ --directory-path=@var{dirs} ]
	122	[ --display-unused-functions ] [ --file-format=@var{name} ]
	123	[ --file-info ] [ --help ] [ --line ] [ --inline-file-names ]
	124	[ --min-count=@var{n} ] [ --no-static ] [ --print-path ]
	125	[ --separate-files ] [ --static-call-graph ] [ --sum ]
	126	[ --table-length=@var{len} ] [ --traditional ] [ --version ]
	127	[ --width=@var{n} ] [ --ignore-non-functions ]
	128	[ --demangle[=@var{STYLE}] ] [ --no-demangle ]
	129	[--external-symbol-table=name]
	130	[ @var{image-file} ] [ @var{profile-file} @dots{} ]
	131	@c man end
	132	@end smallexample
	133
	134	@c man begin DESCRIPTION
	135	@code{gprof} produces an execution profile of C, Pascal, or Fortran77
	136	programs. The effect of called routines is incorporated in the profile
	137	of each caller. The profile data is taken from the call graph profile file
	138	(@file{gmon.out} default) which is created by programs
	139	that are compiled with the @samp{-pg} option of
	140	@code{cc}, @code{pc}, and @code{f77}.
	141	The @samp{-pg} option also links in versions of the library routines
	142	that are compiled for profiling. @code{Gprof} reads the given object
	143	file (the default is @code{a.out}) and establishes the relation between
	144	its symbol table and the call graph profile from @file{gmon.out}.
	145	If more than one profile file is specified, the @code{gprof}
	146	output shows the sum of the profile information in the given profile files.
	147
	148	@code{Gprof} calculates the amount of time spent in each routine.
	149	Next, these times are propagated along the edges of the call graph.
	150	Cycles are discovered, and calls into a cycle are made to share the time
	151	of the cycle.
	152
	153	@c man end
	154
	155	@c man begin BUGS
	156	The granularity of the sampling is shown, but remains
	157	statistical at best.
	158	We assume that the time for each execution of a function
	159	can be expressed by the total time for the function divided
	160	by the number of times the function is called.
	161	Thus the time propagated along the call graph arcs to the function's
	162	parents is directly proportional to the number of times that
	163	arc is traversed.
	164
	165	Parents that are not themselves profiled will have the time of
	166	their profiled children propagated to them, but they will appear
	167	to be spontaneously invoked in the call graph listing, and will
	168	not have their time propagated further.
	169	Similarly, signal catchers, even though profiled, will appear
	170	to be spontaneous (although for more obscure reasons).
	171	Any profiled children of signal catchers should have their times
	172	propagated properly, unless the signal catcher was invoked during
	173	the execution of the profiling routine, in which case all is lost.
	174
	175	The profiled program must call @code{exit}(2)
	176	or return normally for the profiling information to be saved
	177	in the @file{gmon.out} file.
	178	@c man end
	179
	180	@c man begin FILES
	181	@table @code
	182	@item @file{a.out}
	183	the namelist and text space.
	184	@item @file{gmon.out}
	185	dynamic call graph and profile.
	186	@item @file{gmon.sum}
	187	summarized dynamic call graph and profile.
	188	@end table
	189	@c man end
	190
	191	@c man begin SEEALSO
	192	monitor(3), profil(2), cc(1), prof(1), and the Info entry for @file{gprof}.
	193
	194	``An Execution Profiler for Modular Programs'',
	195	by S. Graham, P. Kessler, M. McKusick;
	196	Software - Practice and Experience,
	197	Vol. 13, pp. 671-685, 1983.
	198
	199	``gprof: A Call Graph Execution Profiler'',
	200	by S. Graham, P. Kessler, M. McKusick;
	201	Proceedings of the SIGPLAN '82 Symposium on Compiler Construction,
	202	SIGPLAN Notices, Vol. 17, No 6, pp. 120-126, June 1982.
	203	@c man end
	204	@end ifset
	205
	206	Profiling allows you to learn where your program spent its time and which
	207	functions called which other functions while it was executing. This
	208	information can show you which pieces of your program are slower than you
	209	expected, and might be candidates for rewriting to make your program
	210	execute faster. It can also tell you which functions are being called more
	211	or less often than you expected. This may help you spot bugs that had
	212	otherwise been unnoticed.
	213
	214	Since the profiler uses information collected during the actual execution
	215	of your program, it can be used on programs that are too large or too
	216	complex to analyze by reading the source. However, how your program is run
	217	will affect the information that shows up in the profile data. If you
	218	don't use some feature of your program while it is being profiled, no
	219	profile information will be generated for that feature.
	220
	221	Profiling has several steps:
	222
	223	@itemize @bullet
	224	@item
	225	You must compile and link your program with profiling enabled.
	226	@xref{Compiling, ,Compiling a Program for Profiling}.
	227
	228	@item
	229	You must execute your program to generate a profile data file.
	230	@xref{Executing, ,Executing the Program}.
	231
	232	@item
	233	You must run @code{gprof} to analyze the profile data.
	234	@xref{Invoking, ,@code{gprof} Command Summary}.
	235	@end itemize
	236
	237	The next three chapters explain these steps in greater detail.
	238
	239	@c man begin DESCRIPTION
	240
	241	Several forms of output are available from the analysis.
	242
	243	The @dfn{flat profile} shows how much time your program spent in each function,
	244	and how many times that function was called. If you simply want to know
	245	which functions burn most of the cycles, it is stated concisely here.
	246	@xref{Flat Profile, ,The Flat Profile}.
	247
	248	The @dfn{call graph} shows, for each function, which functions called it, which
	249	other functions it called, and how many times. There is also an estimate
	250	of how much time was spent in the subroutines of each function. This can
	251	suggest places where you might try to eliminate function calls that use a
	252	lot of time. @xref{Call Graph, ,The Call Graph}.
	253
	254	The @dfn{annotated source} listing is a copy of the program's
	255	source code, labeled with the number of times each line of the
	256	program was executed. @xref{Annotated Source, ,The Annotated Source
	257	Listing}.
	258	@c man end
	259
	260	To better understand how profiling works, you may wish to read
	261	a description of its implementation.
	262	@xref{Implementation, ,Implementation of Profiling}.
	263
	264	@node Compiling
	265	@chapter Compiling a Program for Profiling
	266
	267	The first step in generating profile information for your program is
	268	to compile and link it with profiling enabled.
	269
	270	To compile a source file for profiling, specify the @samp{-pg} option when
	271	you run the compiler. (This is in addition to the options you normally
	272	use.)
	273
	274	To link the program for profiling, if you use a compiler such as @code{cc}
	275	to do the linking, simply specify @samp{-pg} in addition to your usual
	276	options. The same option, @samp{-pg}, alters either compilation or linking
	277	to do what is necessary for profiling. Here are examples:
	278
	279	@example
	280	cc -g -c myprog.c utils.c -pg
	281	cc -o myprog myprog.o utils.o -pg
	282	@end example
	283
	284	The @samp{-pg} option also works with a command that both compiles and links:
	285
	286	@example
	287	cc -o myprog myprog.c utils.c -g -pg
	288	@end example
	289
	290	Note: The @samp{-pg} option must be part of your compilation options
	291	as well as your link options. If it is not then no call-graph data
	292	will be gathered and when you run @code{gprof} you will get an error
	293	message like this:
	294
	295	@example
	296	gprof: gmon.out file is missing call-graph data
	297	@end example
	298
	299	If you add the @samp{-Q} switch to suppress the printing of the call
	300	graph data you will still be able to see the time samples:
	301
	302	@example
	303	Flat profile:
	304
	305	Each sample counts as 0.01 seconds.
	306	% cumulative self self total
	307	time seconds seconds calls Ts/call Ts/call name
	308	44.12 0.07 0.07 zazLoop
	309	35.29 0.14 0.06 main
	310	20.59 0.17 0.04 bazMillion
	311	@end example
	312
	313	If you run the linker @code{ld} directly instead of through a compiler
	314	such as @code{cc}, you may have to specify a profiling startup file
	315	@file{gcrt0.o} as the first input file instead of the usual startup
	316	file @file{crt0.o}. In addition, you would probably want to
	317	specify the profiling C library, @file{libc_p.a}, by writing
	318	@samp{-lc_p} instead of the usual @samp{-lc}. This is not absolutely
	319	necessary, but doing this gives you number-of-calls information for
	320	standard library functions such as @code{read} and @code{open}. For
	321	example:
	322
	323	@example
	324	ld -o myprog /lib/gcrt0.o myprog.o utils.o -lc_p
	325	@end example
	326
	327	If you are running the program on a system which supports shared
	328	libraries you may run into problems with the profiling support code in
	329	a shared library being called before that library has been fully
	330	initialised. This is usually detected by the program encountering a
	331	segmentation fault as soon as it is run. The solution is to link
	332	against a static version of the library containing the profiling
	333	support code, which for @code{gcc} users can be done via the
	334	@samp{-static} or @samp{-static-libgcc} command-line option. For
	335	example:
	336
	337	@example
	338	gcc -g -pg -static-libgcc myprog.c utils.c -o myprog
	339	@end example
	340
	341	If you compile only some of the modules of the program with @samp{-pg}, you
	342	can still profile the program, but you won't get complete information about
	343	the modules that were compiled without @samp{-pg}. The only information
	344	you get for the functions in those modules is the total time spent in them;
	345	there is no record of how many times they were called, or from where. This
	346	will not affect the flat profile (except that the @code{calls} field for
	347	the functions will be blank), but will greatly reduce the usefulness of the
	348	call graph.
	349
	350	If you wish to perform line-by-line profiling you should use the
	351	@code{gcov} tool instead of @code{gprof}. See that tool's manual or
	352	info pages for more details of how to do this.
	353
	354	Note, older versions of @code{gcc} produce line-by-line profiling
	355	information that works with @code{gprof} rather than @code{gcov} so
	356	there is still support for displaying this kind of information in
	357	@code{gprof}. @xref{Line-by-line, ,Line-by-line Profiling}.
	358
	359	It also worth noting that @code{gcc} implements a
	360	@samp{-finstrument-functions} command-line option which will insert
	361	calls to special user supplied instrumentation routines at the entry
	362	and exit of every function in their program. This can be used to
	363	implement an alternative profiling scheme.
	364
	365	@node Executing
	366	@chapter Executing the Program
	367
	368	Once the program is compiled for profiling, you must run it in order to
	369	generate the information that @code{gprof} needs. Simply run the program
	370	as usual, using the normal arguments, file names, etc. The program should
	371	run normally, producing the same output as usual. It will, however, run
	372	somewhat slower than normal because of the time spent collecting and
	373	writing the profile data.
	374
	375	The way you run the program---the arguments and input that you give
	376	it---may have a dramatic effect on what the profile information shows. The
	377	profile data will describe the parts of the program that were activated for
	378	the particular input you use. For example, if the first command you give
	379	to your program is to quit, the profile data will show the time used in
	380	initialization and in cleanup, but not much else.
	381
	382	Your program will write the profile data into a file called @file{gmon.out}
	383	just before exiting. If there is already a file called @file{gmon.out},
	384	its contents are overwritten. You can rename the file afterwards if you
	385	are concerned that it may be overwritten. If your system libc allows you
	386	may be able to write the profile data under a different name. Set the
	387	GMON_OUT_PREFIX environment variable; this name will be appended with
	388	the PID of the running program.
	389
	390	In order to write the @file{gmon.out} file properly, your program must exit
	391	normally: by returning from @code{main} or by calling @code{exit}. Calling
	392	the low-level function @code{_exit} does not write the profile data, and
	393	neither does abnormal termination due to an unhandled signal.
	394
	395	The @file{gmon.out} file is written in the program's @emph{current working
	396	directory} at the time it exits. This means that if your program calls
	397	@code{chdir}, the @file{gmon.out} file will be left in the last directory
	398	your program @code{chdir}'d to. If you don't have permission to write in
	399	this directory, the file is not written, and you will get an error message.
	400
	401	Older versions of the @sc{gnu} profiling library may also write a file
	402	called @file{bb.out}. This file, if present, contains an human-readable
	403	listing of the basic-block execution counts. Unfortunately, the
	404	appearance of a human-readable @file{bb.out} means the basic-block
	405	counts didn't get written into @file{gmon.out}.
	406	The Perl script @code{bbconv.pl}, included with the @code{gprof}
	407	source distribution, will convert a @file{bb.out} file into
	408	a format readable by @code{gprof}. Invoke it like this:
	409
	410	@smallexample
	411	bbconv.pl < bb.out > @var{bh-data}
	412	@end smallexample
	413
	414	This translates the information in @file{bb.out} into a form that
	415	@code{gprof} can understand. But you still need to tell @code{gprof}
	416	about the existence of this translated information. To do that, include
	417	@var{bb-data} on the @code{gprof} command line, @emph{along with
	418	@file{gmon.out}}, like this:
	419
	420	@smallexample
	421	gprof @var{options} @var{executable-file} gmon.out @var{bb-data} [@var{yet-more-profile-data-files}@dots{}] [> @var{outfile}]
	422	@end smallexample
	423
	424	@node Invoking
	425	@chapter @code{gprof} Command Summary
	426
	427	After you have a profile data file @file{gmon.out}, you can run @code{gprof}
	428	to interpret the information in it. The @code{gprof} program prints a
	429	flat profile and a call graph on standard output. Typically you would
	430	redirect the output of @code{gprof} into a file with @samp{>}.
	431
	432	You run @code{gprof} like this:
	433
	434	@smallexample
	435	gprof @var{options} [@var{executable-file} [@var{profile-data-files}@dots{}]] [> @var{outfile}]
	436	@end smallexample
	437
	438	@noindent
	439	Here square-brackets indicate optional arguments.
	440
	441	If you omit the executable file name, the file @file{a.out} is used. If
	442	you give no profile data file name, the file @file{gmon.out} is used. If
	443	any file is not in the proper format, or if the profile data file does not
	444	appear to belong to the executable file, an error message is printed.
	445
	446	You can give more than one profile data file by entering all their names
	447	after the executable file name; then the statistics in all the data files
	448	are summed together.
	449
	450	The order of these options does not matter.
	451
	452	@menu
	453	* Output Options:: Controlling @code{gprof}'s output style
	454	* Analysis Options:: Controlling how @code{gprof} analyzes its data
	455	* Miscellaneous Options::
	456	* Deprecated Options:: Options you no longer need to use, but which
	457	have been retained for compatibility
	458	* Symspecs:: Specifying functions to include or exclude
	459	@end menu
	460
	461	@node Output Options
	462	@section Output Options
	463
	464	@c man begin OPTIONS
	465	These options specify which of several output formats
	466	@code{gprof} should produce.
	467
	468	Many of these options take an optional @dfn{symspec} to specify
	469	functions to be included or excluded. These options can be
	470	specified multiple times, with different symspecs, to include
	471	or exclude sets of symbols. @xref{Symspecs, ,Symspecs}.
	472
	473	Specifying any of these options overrides the default (@samp{-p -q}),
	474	which prints a flat profile and call graph analysis
	475	for all functions.
	476
	477	@table @code
	478
	479	@item -A[@var{symspec}]
	480	@itemx --annotated-source[=@var{symspec}]
	481	The @samp{-A} option causes @code{gprof} to print annotated source code.
	482	If @var{symspec} is specified, print output only for matching symbols.
	483	@xref{Annotated Source, ,The Annotated Source Listing}.
	484
	485	@item -b
	486	@itemx --brief
	487	If the @samp{-b} option is given, @code{gprof} doesn't print the
	488	verbose blurbs that try to explain the meaning of all of the fields in
	489	the tables. This is useful if you intend to print out the output, or
	490	are tired of seeing the blurbs.
	491
	492	@item -B
	493	The @samp{-B} option causes @code{gprof} to print the call graph analysis.
	494
	495	@item -C[@var{symspec}]
	496	@itemx --exec-counts[=@var{symspec}]
	497	The @samp{-C} option causes @code{gprof} to
	498	print a tally of functions and the number of times each was called.
	499	If @var{symspec} is specified, print tally only for matching symbols.
	500
	501	If the profile data file contains basic-block count records, specifying
	502	the @samp{-l} option, along with @samp{-C}, will cause basic-block
	503	execution counts to be tallied and displayed.
	504
	505	@item -i
	506	@itemx --file-info
	507	The @samp{-i} option causes @code{gprof} to display summary information
	508	about the profile data file(s) and then exit. The number of histogram,
	509	call graph, and basic-block count records is displayed.
	510
	511	@item -I @var{dirs}
	512	@itemx --directory-path=@var{dirs}
	513	The @samp{-I} option specifies a list of search directories in
	514	which to find source files. Environment variable @var{GPROF_PATH}
	515	can also be used to convey this information.
	516	Used mostly for annotated source output.
	517
	518	@item -J[@var{symspec}]
	519	@itemx --no-annotated-source[=@var{symspec}]
	520	The @samp{-J} option causes @code{gprof} not to
	521	print annotated source code.
	522	If @var{symspec} is specified, @code{gprof} prints annotated source,
	523	but excludes matching symbols.
	524
	525	@item -L
	526	@itemx --print-path
	527	Normally, source filenames are printed with the path
	528	component suppressed. The @samp{-L} option causes @code{gprof}
	529	to print the full pathname of
	530	source filenames, which is determined
	531	from symbolic debugging information in the image file
	532	and is relative to the directory in which the compiler
	533	was invoked.
	534
	535	@item -p[@var{symspec}]
	536	@itemx --flat-profile[=@var{symspec}]
	537	The @samp{-p} option causes @code{gprof} to print a flat profile.
	538	If @var{symspec} is specified, print flat profile only for matching symbols.
	539	@xref{Flat Profile, ,The Flat Profile}.
	540
	541	@item -P[@var{symspec}]
	542	@itemx --no-flat-profile[=@var{symspec}]
	543	The @samp{-P} option causes @code{gprof} to suppress printing a flat profile.
	544	If @var{symspec} is specified, @code{gprof} prints a flat profile,
	545	but excludes matching symbols.
	546
	547	@item -q[@var{symspec}]
	548	@itemx --graph[=@var{symspec}]
	549	The @samp{-q} option causes @code{gprof} to print the call graph analysis.
	550	If @var{symspec} is specified, print call graph only for matching symbols
	551	and their children.
	552	@xref{Call Graph, ,The Call Graph}.
	553
	554	@item -Q[@var{symspec}]
	555	@itemx --no-graph[=@var{symspec}]
	556	The @samp{-Q} option causes @code{gprof} to suppress printing the
	557	call graph.
	558	If @var{symspec} is specified, @code{gprof} prints a call graph,
	559	but excludes matching symbols.
	560
	561	@item -t
	562	@itemx --table-length=@var{num}
	563	The @samp{-t} option causes the @var{num} most active source lines in
	564	each source file to be listed when source annotation is enabled. The
	565	default is 10.
	566
	567	@item -y
	568	@itemx --separate-files
	569	This option affects annotated source output only.
	570	Normally, @code{gprof} prints annotated source files
	571	to standard-output. If this option is specified,
	572	annotated source for a file named @file{path/@var{filename}}
	573	is generated in the file @file{@var{filename}-ann}. If the underlying
	574	file system would truncate @file{@var{filename}-ann} so that it
	575	overwrites the original @file{@var{filename}}, @code{gprof} generates
	576	annotated source in the file @file{@var{filename}.ann} instead (if the
	577	original file name has an extension, that extension is @emph{replaced}
	578	with @file{.ann}).
	579
	580	@item -Z[@var{symspec}]
	581	@itemx --no-exec-counts[=@var{symspec}]
	582	The @samp{-Z} option causes @code{gprof} not to
	583	print a tally of functions and the number of times each was called.
	584	If @var{symspec} is specified, print tally, but exclude matching symbols.
	585
	586	@item -r
	587	@itemx --function-ordering
	588	The @samp{--function-ordering} option causes @code{gprof} to print a
	589	suggested function ordering for the program based on profiling data.
	590	This option suggests an ordering which may improve paging, tlb and
	591	cache behavior for the program on systems which support arbitrary
	592	ordering of functions in an executable.
	593
	594	The exact details of how to force the linker to place functions
	595	in a particular order is system dependent and out of the scope of this
	596	manual.
	597
	598	@item -R @var{map_file}
	599	@itemx --file-ordering @var{map_file}
	600	The @samp{--file-ordering} option causes @code{gprof} to print a
	601	suggested .o link line ordering for the program based on profiling data.
	602	This option suggests an ordering which may improve paging, tlb and
	603	cache behavior for the program on systems which do not support arbitrary
	604	ordering of functions in an executable.
	605
	606	Use of the @samp{-a} argument is highly recommended with this option.
	607
	608	The @var{map_file} argument is a pathname to a file which provides
	609	function name to object file mappings. The format of the file is similar to
	610	the output of the program @code{nm}.
	611
	612	@smallexample
	613	@group
	614	c-parse.o:00000000 T yyparse
	615	c-parse.o:00000004 C yyerrflag
	616	c-lang.o:00000000 T maybe_objc_method_name
	617	c-lang.o:00000000 T print_lang_statistics
	618	c-lang.o:00000000 T recognize_objc_keyword
	619	c-decl.o:00000000 T print_lang_identifier
	620	c-decl.o:00000000 T print_lang_type
	621	@dots{}
	622
	623	@end group
	624	@end smallexample
	625
	626	To create a @var{map_file} with @sc{gnu} @code{nm}, type a command like
	627	@kbd{nm --extern-only --defined-only -v --print-file-name program-name}.
	628
	629	@item -T
	630	@itemx --traditional
	631	The @samp{-T} option causes @code{gprof} to print its output in
	632	``traditional'' BSD style.
	633
	634	@item -w @var{width}
	635	@itemx --width=@var{width}
	636	Sets width of output lines to @var{width}.
	637	Currently only used when printing the function index at the bottom
	638	of the call graph.
	639
	640	@item -x
	641	@itemx --all-lines
	642	This option affects annotated source output only.
	643	By default, only the lines at the beginning of a basic-block
	644	are annotated. If this option is specified, every line in
	645	a basic-block is annotated by repeating the annotation for the
	646	first line. This behavior is similar to @code{tcov}'s @samp{-a}.
	647
	648	@item --demangle[=@var{style}]
	649	@itemx --no-demangle
	650	These options control whether C++ symbol names should be demangled when
	651	printing output. The default is to demangle symbols. The
	652	@code{--no-demangle} option may be used to turn off demangling. Different
	653	compilers have different mangling styles. The optional demangling style
	654	argument can be used to choose an appropriate demangling style for your
	655	compiler.
	656	@end table
	657
	658	@node Analysis Options
	659	@section Analysis Options
	660
	661	@table @code
	662
	663	@item -a
	664	@itemx --no-static
	665	The @samp{-a} option causes @code{gprof} to suppress the printing of
	666	statically declared (private) functions. (These are functions whose
	667	names are not listed as global, and which are not visible outside the
	668	file/function/block where they were defined.) Time spent in these
	669	functions, calls to/from them, etc., will all be attributed to the
	670	function that was loaded directly before it in the executable file.
	671	@c This is compatible with Unix @code{gprof}, but a bad idea.
	672	This option affects both the flat profile and the call graph.
	673
	674	@item -c
	675	@itemx --static-call-graph
	676	The @samp{-c} option causes the call graph of the program to be
	677	augmented by a heuristic which examines the text space of the object
	678	file and identifies function calls in the binary machine code.
	679	Since normal call graph records are only generated when functions are
	680	entered, this option identifies children that could have been called,
	681	but never were. Calls to functions that were not compiled with
	682	profiling enabled are also identified, but only if symbol table
	683	entries are present for them.
	684	Calls to dynamic library routines are typically @emph{not} found
	685	by this option.
	686	Parents or children identified via this heuristic
	687	are indicated in the call graph with call counts of @samp{0}.
	688
	689	@item -D
	690	@itemx --ignore-non-functions
	691	The @samp{-D} option causes @code{gprof} to ignore symbols which
	692	are not known to be functions. This option will give more accurate
	693	profile data on systems where it is supported (Solaris and HPUX for
	694	example).
	695
	696	@item -k @var{from}/@var{to}
	697	The @samp{-k} option allows you to delete from the call graph any arcs from
	698	symbols matching symspec @var{from} to those matching symspec @var{to}.
	699
	700	@item -l
	701	@itemx --line
	702	The @samp{-l} option enables line-by-line profiling, which causes
	703	histogram hits to be charged to individual source code lines,
	704	instead of functions. This feature only works with programs compiled
	705	by older versions of the @code{gcc} compiler. Newer versions of
	706	@code{gcc} are designed to work with the @code{gcov} tool instead.
	707
	708	If the program was compiled with basic-block counting enabled,
	709	this option will also identify how many times each line of
	710	code was executed.
	711	While line-by-line profiling can help isolate where in a large function
	712	a program is spending its time, it also significantly increases
	713	the running time of @code{gprof}, and magnifies statistical
	714	inaccuracies.
	715	@xref{Sampling Error, ,Statistical Sampling Error}.
	716
	717	@item --inline-file-names
	718	This option causes @code{gprof} to print the source file after each
	719	symbol in both the flat profile and the call graph. The full path to the
	720	file is printed if used with the @samp{-L} option.
	721
	722	@item -m @var{num}
	723	@itemx --min-count=@var{num}
	724	This option affects execution count output only.
	725	Symbols that are executed less than @var{num} times are suppressed.
	726
	727	@item -n@var{symspec}
	728	@itemx --time=@var{symspec}
	729	The @samp{-n} option causes @code{gprof}, in its call graph analysis,
	730	to only propagate times for symbols matching @var{symspec}.
	731
	732	@item -N@var{symspec}
	733	@itemx --no-time=@var{symspec}
	734	The @samp{-n} option causes @code{gprof}, in its call graph analysis,
	735	not to propagate times for symbols matching @var{symspec}.
	736
	737	@item -S@var{filename}
	738	@itemx --external-symbol-table=@var{filename}
	739	The @samp{-S} option causes @code{gprof} to read an external symbol table
	740	file, such as @file{/proc/kallsyms}, rather than read the symbol table
	741	from the given object file (the default is @code{a.out}). This is useful
	742	for profiling kernel modules.
	743
	744	@item -z
	745	@itemx --display-unused-functions
	746	If you give the @samp{-z} option, @code{gprof} will mention all
	747	functions in the flat profile, even those that were never called, and
	748	that had no time spent in them. This is useful in conjunction with the
	749	@samp{-c} option for discovering which routines were never called.
	750
	751	@end table
	752
	753	@node Miscellaneous Options
	754	@section Miscellaneous Options
	755
	756	@table @code
	757
	758	@item -d[@var{num}]
	759	@itemx --debug[=@var{num}]
	760	The @samp{-d @var{num}} option specifies debugging options.
	761	If @var{num} is not specified, enable all debugging.
	762	@xref{Debugging, ,Debugging @code{gprof}}.
	763
	764	@item -h
	765	@itemx --help
	766	The @samp{-h} option prints command line usage.
	767
	768	@item -O@var{name}
	769	@itemx --file-format=@var{name}
	770	Selects the format of the profile data files. Recognized formats are
	771	@samp{auto} (the default), @samp{bsd}, @samp{4.4bsd}, @samp{magic}, and
	772	@samp{prof} (not yet supported).
	773
	774	@item -s
	775	@itemx --sum
	776	The @samp{-s} option causes @code{gprof} to summarize the information
	777	in the profile data files it read in, and write out a profile data
	778	file called @file{gmon.sum}, which contains all the information from
	779	the profile data files that @code{gprof} read in. The file @file{gmon.sum}
	780	may be one of the specified input files; the effect of this is to
	781	merge the data in the other input files into @file{gmon.sum}.
	782
	783	Eventually you can run @code{gprof} again without @samp{-s} to analyze the
	784	cumulative data in the file @file{gmon.sum}.
	785
	786	@item -v
	787	@itemx --version
	788	The @samp{-v} flag causes @code{gprof} to print the current version
	789	number, and then exit.
	790
	791	@end table
	792
	793	@node Deprecated Options
	794	@section Deprecated Options
	795
	796	These options have been replaced with newer versions that use symspecs.
	797
	798	@table @code
	799
	800	@item -e @var{function_name}
	801	The @samp{-e @var{function}} option tells @code{gprof} to not print
	802	information about the function @var{function_name} (and its
	803	children@dots{}) in the call graph. The function will still be listed
	804	as a child of any functions that call it, but its index number will be
	805	shown as @samp{[not printed]}. More than one @samp{-e} option may be
	806	given; only one @var{function_name} may be indicated with each @samp{-e}
	807	option.
	808
	809	@item -E @var{function_name}
	810	The @code{-E @var{function}} option works like the @code{-e} option, but
	811	time spent in the function (and children who were not called from
	812	anywhere else), will not be used to compute the percentages-of-time for
	813	the call graph. More than one @samp{-E} option may be given; only one
	814	@var{function_name} may be indicated with each @samp{-E} option.
	815
	816	@item -f @var{function_name}
	817	The @samp{-f @var{function}} option causes @code{gprof} to limit the
	818	call graph to the function @var{function_name} and its children (and
	819	their children@dots{}). More than one @samp{-f} option may be given;
	820	only one @var{function_name} may be indicated with each @samp{-f}
	821	option.
	822
	823	@item -F @var{function_name}
	824	The @samp{-F @var{function}} option works like the @code{-f} option, but
	825	only time spent in the function and its children (and their
	826	children@dots{}) will be used to determine total-time and
	827	percentages-of-time for the call graph. More than one @samp{-F} option
	828	may be given; only one @var{function_name} may be indicated with each
	829	@samp{-F} option. The @samp{-F} option overrides the @samp{-E} option.
	830
	831	@end table
	832
	833	@c man end
	834
	835	Note that only one function can be specified with each @code{-e},
	836	@code{-E}, @code{-f} or @code{-F} option. To specify more than one
	837	function, use multiple options. For example, this command:
	838
	839	@example
	840	gprof -e boring -f foo -f bar myprogram > gprof.output
	841	@end example
	842
	843	@noindent
	844	lists in the call graph all functions that were reached from either
	845	@code{foo} or @code{bar} and were not reachable from @code{boring}.
	846
	847	@node Symspecs
	848	@section Symspecs
	849
	850	Many of the output options allow functions to be included or excluded
	851	using @dfn{symspecs} (symbol specifications), which observe the
	852	following syntax:
	853
	854	@example
	855	filename_containing_a_dot
	856	\| funcname_not_containing_a_dot
	857	\| linenumber
	858	\| ( [ any_filename ] `:' ( any_funcname \| linenumber ) )
	859	@end example
	860
	861	Here are some sample symspecs:
	862
	863	@table @samp
	864	@item main.c
	865	Selects everything in file @file{main.c}---the
	866	dot in the string tells @code{gprof} to interpret
	867	the string as a filename, rather than as
	868	a function name. To select a file whose
	869	name does not contain a dot, a trailing colon
	870	should be specified. For example, @samp{odd:} is
	871	interpreted as the file named @file{odd}.
	872
	873	@item main
	874	Selects all functions named @samp{main}.
	875
	876	Note that there may be multiple instances of the same function name
	877	because some of the definitions may be local (i.e., static). Unless a
	878	function name is unique in a program, you must use the colon notation
	879	explained below to specify a function from a specific source file.
	880
	881	Sometimes, function names contain dots. In such cases, it is necessary
	882	to add a leading colon to the name. For example, @samp{:.mul} selects
	883	function @samp{.mul}.
	884
	885	In some object file formats, symbols have a leading underscore.
	886	@code{gprof} will normally not print these underscores. When you name a
	887	symbol in a symspec, you should type it exactly as @code{gprof} prints
	888	it in its output. For example, if the compiler produces a symbol
	889	@samp{_main} from your @code{main} function, @code{gprof} still prints
	890	it as @samp{main} in its output, so you should use @samp{main} in
	891	symspecs.
	892
	893	@item main.c:main
	894	Selects function @samp{main} in file @file{main.c}.
	895
	896	@item main.c:134
	897	Selects line 134 in file @file{main.c}.
	898	@end table
	899
	900	@node Output
	901	@chapter Interpreting @code{gprof}'s Output
	902
	903	@code{gprof} can produce several different output styles, the
	904	most important of which are described below. The simplest output
	905	styles (file information, execution count, and function and file ordering)
	906	are not described here, but are documented with the respective options
	907	that trigger them.
	908	@xref{Output Options, ,Output Options}.
	909
	910	@menu
	911	* Flat Profile:: The flat profile shows how much time was spent
	912	executing directly in each function.
	913	* Call Graph:: The call graph shows which functions called which
	914	others, and how much time each function used
	915	when its subroutine calls are included.
	916	* Line-by-line:: @code{gprof} can analyze individual source code lines
	917	* Annotated Source:: The annotated source listing displays source code
	918	labeled with execution counts
	919	@end menu
	920
	921
	922	@node Flat Profile
	923	@section The Flat Profile
	924	@cindex flat profile
	925
	926	The @dfn{flat profile} shows the total amount of time your program
	927	spent executing each function. Unless the @samp{-z} option is given,
	928	functions with no apparent time spent in them, and no apparent calls
	929	to them, are not mentioned. Note that if a function was not compiled
	930	for profiling, and didn't run long enough to show up on the program
	931	counter histogram, it will be indistinguishable from a function that
	932	was never called.
	933
	934	This is part of a flat profile for a small program:
	935
	936	@smallexample
	937	@group
	938	Flat profile:
	939
	940	Each sample counts as 0.01 seconds.
	941	% cumulative self self total
	942	time seconds seconds calls ms/call ms/call name
	943	33.34 0.02 0.02 7208 0.00 0.00 open
	944	16.67 0.03 0.01 244 0.04 0.12 offtime
	945	16.67 0.04 0.01 8 1.25 1.25 memccpy
	946	16.67 0.05 0.01 7 1.43 1.43 write
	947	16.67 0.06 0.01 mcount
	948	0.00 0.06 0.00 236 0.00 0.00 tzset
	949	0.00 0.06 0.00 192 0.00 0.00 tolower
	950	0.00 0.06 0.00 47 0.00 0.00 strlen
	951	0.00 0.06 0.00 45 0.00 0.00 strchr
	952	0.00 0.06 0.00 1 0.00 50.00 main
	953	0.00 0.06 0.00 1 0.00 0.00 memcpy
	954	0.00 0.06 0.00 1 0.00 10.11 print
	955	0.00 0.06 0.00 1 0.00 0.00 profil
	956	0.00 0.06 0.00 1 0.00 50.00 report
	957	@dots{}
	958	@end group
	959	@end smallexample
	960
	961	@noindent
	962	The functions are sorted first by decreasing run-time spent in them,
	963	then by decreasing number of calls, then alphabetically by name. The
	964	functions @samp{mcount} and @samp{profil} are part of the profiling
	965	apparatus and appear in every flat profile; their time gives a measure of
	966	the amount of overhead due to profiling.
	967
	968	Just before the column headers, a statement appears indicating
	969	how much time each sample counted as.
	970	This @dfn{sampling period} estimates the margin of error in each of the time
	971	figures. A time figure that is not much larger than this is not
	972	reliable. In this example, each sample counted as 0.01 seconds,
	973	suggesting a 100 Hz sampling rate.
	974	The program's total execution time was 0.06
	975	seconds, as indicated by the @samp{cumulative seconds} field. Since
	976	each sample counted for 0.01 seconds, this means only six samples
	977	were taken during the run. Two of the samples occurred while the
	978	program was in the @samp{open} function, as indicated by the
	979	@samp{self seconds} field. Each of the other four samples
	980	occurred one each in @samp{offtime}, @samp{memccpy}, @samp{write},
	981	and @samp{mcount}.
	982	Since only six samples were taken, none of these values can
	983	be regarded as particularly reliable.
	984	In another run,
	985	the @samp{self seconds} field for
	986	@samp{mcount} might well be @samp{0.00} or @samp{0.02}.
	987	@xref{Sampling Error, ,Statistical Sampling Error},
	988	for a complete discussion.
	989
	990	The remaining functions in the listing (those whose
	991	@samp{self seconds} field is @samp{0.00}) didn't appear
	992	in the histogram samples at all. However, the call graph
	993	indicated that they were called, so therefore they are listed,
	994	sorted in decreasing order by the @samp{calls} field.
	995	Clearly some time was spent executing these functions,
	996	but the paucity of histogram samples prevents any
	997	determination of how much time each took.
	998
	999	Here is what the fields in each line mean:
	1000
	1001	@table @code
	1002	@item % time
	1003	This is the percentage of the total execution time your program spent
	1004	in this function. These should all add up to 100%.
	1005
	1006	@item cumulative seconds
	1007	This is the cumulative total number of seconds the computer spent
	1008	executing this functions, plus the time spent in all the functions
	1009	above this one in this table.
	1010
	1011	@item self seconds
	1012	This is the number of seconds accounted for by this function alone.
	1013	The flat profile listing is sorted first by this number.
	1014
	1015	@item calls
	1016	This is the total number of times the function was called. If the
	1017	function was never called, or the number of times it was called cannot
	1018	be determined (probably because the function was not compiled with
	1019	profiling enabled), the @dfn{calls} field is blank.
	1020
	1021	@item self ms/call
	1022	This represents the average number of milliseconds spent in this
	1023	function per call, if this function is profiled. Otherwise, this field
	1024	is blank for this function.
	1025
	1026	@item total ms/call
	1027	This represents the average number of milliseconds spent in this
	1028	function and its descendants per call, if this function is profiled.
	1029	Otherwise, this field is blank for this function.
	1030	This is the only field in the flat profile that uses call graph analysis.
	1031
	1032	@item name
	1033	This is the name of the function. The flat profile is sorted by this
	1034	field alphabetically after the @dfn{self seconds} and @dfn{calls}
	1035	fields are sorted.
	1036	@end table
	1037
	1038	@node Call Graph
	1039	@section The Call Graph
	1040	@cindex call graph
	1041
	1042	The @dfn{call graph} shows how much time was spent in each function
	1043	and its children. From this information, you can find functions that,
	1044	while they themselves may not have used much time, called other
	1045	functions that did use unusual amounts of time.
	1046
	1047	Here is a sample call from a small program. This call came from the
	1048	same @code{gprof} run as the flat profile example in the previous
	1049	section.
	1050
	1051	@smallexample
	1052	@group
	1053	granularity: each sample hit covers 2 byte(s) for 20.00% of 0.05 seconds
	1054
	1055	index % time self children called name
	1056	<spontaneous>
	1057	[1] 100.0 0.00 0.05 start [1]
	1058	0.00 0.05 1/1 main [2]
	1059	0.00 0.00 1/2 on_exit [28]
	1060	0.00 0.00 1/1 exit [59]
	1061	-----------------------------------------------
	1062	0.00 0.05 1/1 start [1]
	1063	[2] 100.0 0.00 0.05 1 main [2]
	1064	0.00 0.05 1/1 report [3]
	1065	-----------------------------------------------
	1066	0.00 0.05 1/1 main [2]
	1067	[3] 100.0 0.00 0.05 1 report [3]
	1068	0.00 0.03 8/8 timelocal [6]
	1069	0.00 0.01 1/1 print [9]
	1070	0.00 0.01 9/9 fgets [12]
	1071	0.00 0.00 12/34 strncmp <cycle 1> [40]
	1072	0.00 0.00 8/8 lookup [20]
	1073	0.00 0.00 1/1 fopen [21]
	1074	0.00 0.00 8/8 chewtime [24]
	1075	0.00 0.00 8/16 skipspace [44]
	1076	-----------------------------------------------
	1077	[4] 59.8 0.01 0.02 8+472 <cycle 2 as a whole> [4]
	1078	0.01 0.02 244+260 offtime <cycle 2> [7]
	1079	0.00 0.00 236+1 tzset <cycle 2> [26]
	1080	-----------------------------------------------
	1081	@end group
	1082	@end smallexample
	1083
	1084	The lines full of dashes divide this table into @dfn{entries}, one for each
	1085	function. Each entry has one or more lines.
	1086
	1087	In each entry, the primary line is the one that starts with an index number
	1088	in square brackets. The end of this line says which function the entry is
	1089	for. The preceding lines in the entry describe the callers of this
	1090	function and the following lines describe its subroutines (also called
	1091	@dfn{children} when we speak of the call graph).
	1092
	1093	The entries are sorted by time spent in the function and its subroutines.
	1094
	1095	The internal profiling function @code{mcount} (@pxref{Flat Profile, ,The
	1096	Flat Profile}) is never mentioned in the call graph.
	1097
	1098	@menu
	1099	* Primary:: Details of the primary line's contents.
	1100	* Callers:: Details of caller-lines' contents.
	1101	* Subroutines:: Details of subroutine-lines' contents.
	1102	* Cycles:: When there are cycles of recursion,
	1103	such as @code{a} calls @code{b} calls @code{a}@dots{}
	1104	@end menu
	1105
	1106	@node Primary
	1107	@subsection The Primary Line
	1108
	1109	The @dfn{primary line} in a call graph entry is the line that
	1110	describes the function which the entry is about and gives the overall
	1111	statistics for this function.
	1112
	1113	For reference, we repeat the primary line from the entry for function
	1114	@code{report} in our main example, together with the heading line that
	1115	shows the names of the fields:
	1116
	1117	@smallexample
	1118	@group
	1119	index % time self children called name
	1120	@dots{}
	1121	[3] 100.0 0.00 0.05 1 report [3]
	1122	@end group
	1123	@end smallexample
	1124
	1125	Here is what the fields in the primary line mean:
	1126
	1127	@table @code
	1128	@item index
	1129	Entries are numbered with consecutive integers. Each function
	1130	therefore has an index number, which appears at the beginning of its
	1131	primary line.
	1132
	1133	Each cross-reference to a function, as a caller or subroutine of
	1134	another, gives its index number as well as its name. The index number
	1135	guides you if you wish to look for the entry for that function.
	1136
	1137	@item % time
	1138	This is the percentage of the total time that was spent in this
	1139	function, including time spent in subroutines called from this
	1140	function.
	1141
	1142	The time spent in this function is counted again for the callers of
	1143	this function. Therefore, adding up these percentages is meaningless.
	1144
	1145	@item self
	1146	This is the total amount of time spent in this function. This
	1147	should be identical to the number printed in the @code{seconds} field
	1148	for this function in the flat profile.
	1149
	1150	@item children
	1151	This is the total amount of time spent in the subroutine calls made by
	1152	this function. This should be equal to the sum of all the @code{self}
	1153	and @code{children} entries of the children listed directly below this
	1154	function.
	1155
	1156	@item called
	1157	This is the number of times the function was called.
	1158
	1159	If the function called itself recursively, there are two numbers,
	1160	separated by a @samp{+}. The first number counts non-recursive calls,
	1161	and the second counts recursive calls.
	1162
	1163	In the example above, the function @code{report} was called once from
	1164	@code{main}.
	1165
	1166	@item name
	1167	This is the name of the current function. The index number is
	1168	repeated after it.
	1169
	1170	If the function is part of a cycle of recursion, the cycle number is
	1171	printed between the function's name and the index number
	1172	(@pxref{Cycles, ,How Mutually Recursive Functions Are Described}).
	1173	For example, if function @code{gnurr} is part of
	1174	cycle number one, and has index number twelve, its primary line would
	1175	be end like this:
	1176
	1177	@example
	1178	gnurr <cycle 1> [12]
	1179	@end example
	1180	@end table
	1181
	1182	@node Callers
	1183	@subsection Lines for a Function's Callers
	1184
	1185	A function's entry has a line for each function it was called by.
	1186	These lines' fields correspond to the fields of the primary line, but
	1187	their meanings are different because of the difference in context.
	1188
	1189	For reference, we repeat two lines from the entry for the function
	1190	@code{report}, the primary line and one caller-line preceding it, together
	1191	with the heading line that shows the names of the fields:
	1192
	1193	@smallexample
	1194	index % time self children called name
	1195	@dots{}
	1196	0.00 0.05 1/1 main [2]
	1197	[3] 100.0 0.00 0.05 1 report [3]
	1198	@end smallexample
	1199
	1200	Here are the meanings of the fields in the caller-line for @code{report}
	1201	called from @code{main}:
	1202
	1203	@table @code
	1204	@item self
	1205	An estimate of the amount of time spent in @code{report} itself when it was
	1206	called from @code{main}.
	1207
	1208	@item children
	1209	An estimate of the amount of time spent in subroutines of @code{report}
	1210	when @code{report} was called from @code{main}.
	1211
	1212	The sum of the @code{self} and @code{children} fields is an estimate
	1213	of the amount of time spent within calls to @code{report} from @code{main}.
	1214
	1215	@item called
	1216	Two numbers: the number of times @code{report} was called from @code{main},
	1217	followed by the total number of non-recursive calls to @code{report} from
	1218	all its callers.
	1219
	1220	@item name and index number
	1221	The name of the caller of @code{report} to which this line applies,
	1222	followed by the caller's index number.
	1223
	1224	Not all functions have entries in the call graph; some
	1225	options to @code{gprof} request the omission of certain functions.
	1226	When a caller has no entry of its own, it still has caller-lines
	1227	in the entries of the functions it calls.
	1228
	1229	If the caller is part of a recursion cycle, the cycle number is
	1230	printed between the name and the index number.
	1231	@end table
	1232
	1233	If the identity of the callers of a function cannot be determined, a
	1234	dummy caller-line is printed which has @samp{<spontaneous>} as the
	1235	``caller's name'' and all other fields blank. This can happen for
	1236	signal handlers.
	1237	@c What if some calls have determinable callers' names but not all?
	1238	@c FIXME - still relevant?
	1239
	1240	@node Subroutines
	1241	@subsection Lines for a Function's Subroutines
	1242
	1243	A function's entry has a line for each of its subroutines---in other
	1244	words, a line for each other function that it called. These lines'
	1245	fields correspond to the fields of the primary line, but their meanings
	1246	are different because of the difference in context.
	1247
	1248	For reference, we repeat two lines from the entry for the function
	1249	@code{main}, the primary line and a line for a subroutine, together
	1250	with the heading line that shows the names of the fields:
	1251
	1252	@smallexample
	1253	index % time self children called name
	1254	@dots{}
	1255	[2] 100.0 0.00 0.05 1 main [2]
	1256	0.00 0.05 1/1 report [3]
	1257	@end smallexample
	1258
	1259	Here are the meanings of the fields in the subroutine-line for @code{main}
	1260	calling @code{report}:
	1261
	1262	@table @code
	1263	@item self
	1264	An estimate of the amount of time spent directly within @code{report}
	1265	when @code{report} was called from @code{main}.
	1266
	1267	@item children
	1268	An estimate of the amount of time spent in subroutines of @code{report}
	1269	when @code{report} was called from @code{main}.
	1270
	1271	The sum of the @code{self} and @code{children} fields is an estimate
	1272	of the total time spent in calls to @code{report} from @code{main}.
	1273
	1274	@item called
	1275	Two numbers, the number of calls to @code{report} from @code{main}
	1276	followed by the total number of non-recursive calls to @code{report}.
	1277	This ratio is used to determine how much of @code{report}'s @code{self}
	1278	and @code{children} time gets credited to @code{main}.
	1279	@xref{Assumptions, ,Estimating @code{children} Times}.
	1280
	1281	@item name
	1282	The name of the subroutine of @code{main} to which this line applies,
	1283	followed by the subroutine's index number.
	1284
	1285	If the caller is part of a recursion cycle, the cycle number is
	1286	printed between the name and the index number.
	1287	@end table
	1288
	1289	@node Cycles
	1290	@subsection How Mutually Recursive Functions Are Described
	1291	@cindex cycle
	1292	@cindex recursion cycle
	1293
	1294	The graph may be complicated by the presence of @dfn{cycles of
	1295	recursion} in the call graph. A cycle exists if a function calls
	1296	another function that (directly or indirectly) calls (or appears to
	1297	call) the original function. For example: if @code{a} calls @code{b},
	1298	and @code{b} calls @code{a}, then @code{a} and @code{b} form a cycle.
	1299
	1300	Whenever there are call paths both ways between a pair of functions, they
	1301	belong to the same cycle. If @code{a} and @code{b} call each other and
	1302	@code{b} and @code{c} call each other, all three make one cycle. Note that
	1303	even if @code{b} only calls @code{a} if it was not called from @code{a},
	1304	@code{gprof} cannot determine this, so @code{a} and @code{b} are still
	1305	considered a cycle.
	1306
	1307	The cycles are numbered with consecutive integers. When a function
	1308	belongs to a cycle, each time the function name appears in the call graph
	1309	it is followed by @samp{<cycle @var{number}>}.
	1310
	1311	The reason cycles matter is that they make the time values in the call
	1312	graph paradoxical. The ``time spent in children'' of @code{a} should
	1313	include the time spent in its subroutine @code{b} and in @code{b}'s
	1314	subroutines---but one of @code{b}'s subroutines is @code{a}! How much of
	1315	@code{a}'s time should be included in the children of @code{a}, when
	1316	@code{a} is indirectly recursive?
	1317
	1318	The way @code{gprof} resolves this paradox is by creating a single entry
	1319	for the cycle as a whole. The primary line of this entry describes the
	1320	total time spent directly in the functions of the cycle. The
	1321	``subroutines'' of the cycle are the individual functions of the cycle, and
	1322	all other functions that were called directly by them. The ``callers'' of
	1323	the cycle are the functions, outside the cycle, that called functions in
	1324	the cycle.
	1325
	1326	Here is an example portion of a call graph which shows a cycle containing
	1327	functions @code{a} and @code{b}. The cycle was entered by a call to
	1328	@code{a} from @code{main}; both @code{a} and @code{b} called @code{c}.
	1329
	1330	@smallexample
	1331	index % time self children called name
	1332	----------------------------------------
	1333	1.77 0 1/1 main [2]
	1334	[3] 91.71 1.77 0 1+5 <cycle 1 as a whole> [3]
	1335	1.02 0 3 b <cycle 1> [4]
	1336	0.75 0 2 a <cycle 1> [5]
	1337	----------------------------------------
	1338	3 a <cycle 1> [5]
	1339	[4] 52.85 1.02 0 0 b <cycle 1> [4]
	1340	2 a <cycle 1> [5]
	1341	0 0 3/6 c [6]
	1342	----------------------------------------
	1343	1.77 0 1/1 main [2]
	1344	2 b <cycle 1> [4]
	1345	[5] 38.86 0.75 0 1 a <cycle 1> [5]
	1346	3 b <cycle 1> [4]
	1347	0 0 3/6 c [6]
	1348	----------------------------------------
	1349	@end smallexample
	1350
	1351	@noindent
	1352	(The entire call graph for this program contains in addition an entry for
	1353	@code{main}, which calls @code{a}, and an entry for @code{c}, with callers
	1354	@code{a} and @code{b}.)
	1355
	1356	@smallexample
	1357	index % time self children called name
	1358	<spontaneous>
	1359	[1] 100.00 0 1.93 0 start [1]
	1360	0.16 1.77 1/1 main [2]
	1361	----------------------------------------
	1362	0.16 1.77 1/1 start [1]
	1363	[2] 100.00 0.16 1.77 1 main [2]
	1364	1.77 0 1/1 a <cycle 1> [5]
	1365	----------------------------------------
	1366	1.77 0 1/1 main [2]
	1367	[3] 91.71 1.77 0 1+5 <cycle 1 as a whole> [3]
	1368	1.02 0 3 b <cycle 1> [4]
	1369	0.75 0 2 a <cycle 1> [5]
	1370	0 0 6/6 c [6]
	1371	----------------------------------------
	1372	3 a <cycle 1> [5]
	1373	[4] 52.85 1.02 0 0 b <cycle 1> [4]
	1374	2 a <cycle 1> [5]
	1375	0 0 3/6 c [6]
	1376	----------------------------------------
	1377	1.77 0 1/1 main [2]
	1378	2 b <cycle 1> [4]
	1379	[5] 38.86 0.75 0 1 a <cycle 1> [5]
	1380	3 b <cycle 1> [4]
	1381	0 0 3/6 c [6]
	1382	----------------------------------------
	1383	0 0 3/6 b <cycle 1> [4]
	1384	0 0 3/6 a <cycle 1> [5]
	1385	[6] 0.00 0 0 6 c [6]
	1386	----------------------------------------
	1387	@end smallexample
	1388
	1389	The @code{self} field of the cycle's primary line is the total time
	1390	spent in all the functions of the cycle. It equals the sum of the
	1391	@code{self} fields for the individual functions in the cycle, found
	1392	in the entry in the subroutine lines for these functions.
	1393
	1394	The @code{children} fields of the cycle's primary line and subroutine lines
	1395	count only subroutines outside the cycle. Even though @code{a} calls
	1396	@code{b}, the time spent in those calls to @code{b} is not counted in
	1397	@code{a}'s @code{children} time. Thus, we do not encounter the problem of
	1398	what to do when the time in those calls to @code{b} includes indirect
	1399	recursive calls back to @code{a}.
	1400
	1401	The @code{children} field of a caller-line in the cycle's entry estimates
	1402	the amount of time spent @emph{in the whole cycle}, and its other
	1403	subroutines, on the times when that caller called a function in the cycle.
	1404
	1405	The @code{called} field in the primary line for the cycle has two numbers:
	1406	first, the number of times functions in the cycle were called by functions
	1407	outside the cycle; second, the number of times they were called by
	1408	functions in the cycle (including times when a function in the cycle calls
	1409	itself). This is a generalization of the usual split into non-recursive and
	1410	recursive calls.
	1411
	1412	The @code{called} field of a subroutine-line for a cycle member in the
	1413	cycle's entry says how many time that function was called from functions in
	1414	the cycle. The total of all these is the second number in the primary line's
	1415	@code{called} field.
	1416
	1417	In the individual entry for a function in a cycle, the other functions in
	1418	the same cycle can appear as subroutines and as callers. These lines show
	1419	how many times each function in the cycle called or was called from each other
	1420	function in the cycle. The @code{self} and @code{children} fields in these
	1421	lines are blank because of the difficulty of defining meanings for them
	1422	when recursion is going on.
	1423
	1424	@node Line-by-line
	1425	@section Line-by-line Profiling
	1426
	1427	@code{gprof}'s @samp{-l} option causes the program to perform
	1428	@dfn{line-by-line} profiling. In this mode, histogram
	1429	samples are assigned not to functions, but to individual
	1430	lines of source code. This only works with programs compiled with
	1431	older versions of the @code{gcc} compiler. Newer versions of @code{gcc}
	1432	use a different program - @code{gcov} - to display line-by-line
	1433	profiling information.
	1434
	1435	With the older versions of @code{gcc} the program usually has to be
	1436	compiled with a @samp{-g} option, in addition to @samp{-pg}, in order
	1437	to generate debugging symbols for tracking source code lines.
	1438	Note, in much older versions of @code{gcc} the program had to be
	1439	compiled with the @samp{-a} command-line option as well.
	1440
	1441	The flat profile is the most useful output table
	1442	in line-by-line mode.
	1443	The call graph isn't as useful as normal, since
	1444	the current version of @code{gprof} does not propagate
	1445	call graph arcs from source code lines to the enclosing function.
	1446	The call graph does, however, show each line of code
	1447	that called each function, along with a count.
	1448
	1449	Here is a section of @code{gprof}'s output, without line-by-line profiling.
	1450	Note that @code{ct_init} accounted for four histogram hits, and
	1451	13327 calls to @code{init_block}.
	1452
	1453	@smallexample
	1454	Flat profile:
	1455
	1456	Each sample counts as 0.01 seconds.
	1457	% cumulative self self total
	1458	time seconds seconds calls us/call us/call name
	1459	30.77 0.13 0.04 6335 6.31 6.31 ct_init
	1460
	1461
	1462	Call graph (explanation follows)
	1463
	1464
	1465	granularity: each sample hit covers 4 byte(s) for 7.69% of 0.13 seconds
	1466
	1467	index % time self children called name
	1468
	1469	0.00 0.00 1/13496 name_too_long
	1470	0.00 0.00 40/13496 deflate
	1471	0.00 0.00 128/13496 deflate_fast
	1472	0.00 0.00 13327/13496 ct_init
	1473	[7] 0.0 0.00 0.00 13496 init_block
	1474
	1475	@end smallexample
	1476
	1477	Now let's look at some of @code{gprof}'s output from the same program run,
	1478	this time with line-by-line profiling enabled. Note that @code{ct_init}'s
	1479	four histogram hits are broken down into four lines of source code---one hit
	1480	occurred on each of lines 349, 351, 382 and 385. In the call graph,
	1481	note how
	1482	@code{ct_init}'s 13327 calls to @code{init_block} are broken down
	1483	into one call from line 396, 3071 calls from line 384, 3730 calls
	1484	from line 385, and 6525 calls from 387.
	1485
	1486	@smallexample
	1487	Flat profile:
	1488
	1489	Each sample counts as 0.01 seconds.
	1490	% cumulative self
	1491	time seconds seconds calls name
	1492	7.69 0.10 0.01 ct_init (trees.c:349)
	1493	7.69 0.11 0.01 ct_init (trees.c:351)
	1494	7.69 0.12 0.01 ct_init (trees.c:382)
	1495	7.69 0.13 0.01 ct_init (trees.c:385)
	1496
	1497
	1498	Call graph (explanation follows)
	1499
	1500
	1501	granularity: each sample hit covers 4 byte(s) for 7.69% of 0.13 seconds
	1502
	1503	% time self children called name
	1504
	1505	0.00 0.00 1/13496 name_too_long (gzip.c:1440)
	1506	0.00 0.00 1/13496 deflate (deflate.c:763)
	1507	0.00 0.00 1/13496 ct_init (trees.c:396)
	1508	0.00 0.00 2/13496 deflate (deflate.c:727)
	1509	0.00 0.00 4/13496 deflate (deflate.c:686)
	1510	0.00 0.00 5/13496 deflate (deflate.c:675)
	1511	0.00 0.00 12/13496 deflate (deflate.c:679)
	1512	0.00 0.00 16/13496 deflate (deflate.c:730)
	1513	0.00 0.00 128/13496 deflate_fast (deflate.c:654)
	1514	0.00 0.00 3071/13496 ct_init (trees.c:384)
	1515	0.00 0.00 3730/13496 ct_init (trees.c:385)
	1516	0.00 0.00 6525/13496 ct_init (trees.c:387)
	1517	[6] 0.0 0.00 0.00 13496 init_block (trees.c:408)
	1518
	1519	@end smallexample
	1520
	1521
	1522	@node Annotated Source
	1523	@section The Annotated Source Listing
	1524
	1525	@code{gprof}'s @samp{-A} option triggers an annotated source listing,
	1526	which lists the program's source code, each function labeled with the
	1527	number of times it was called. You may also need to specify the
	1528	@samp{-I} option, if @code{gprof} can't find the source code files.
	1529
	1530	With older versions of @code{gcc} compiling with @samp{gcc @dots{} -g
	1531	-pg -a} augments your program with basic-block counting code, in
	1532	addition to function counting code. This enables @code{gprof} to
	1533	determine how many times each line of code was executed. With newer
	1534	versions of @code{gcc} support for displaying basic-block counts is
	1535	provided by the @code{gcov} program.
	1536
	1537	For example, consider the following function, taken from gzip,
	1538	with line numbers added:
	1539
	1540	@smallexample
	1541	1 ulg updcrc(s, n)
	1542	2 uch *s;
	1543	3 unsigned n;
	1544	4 @{
	1545	5 register ulg c;
	1546	6
	1547	7 static ulg crc = (ulg)0xffffffffL;
	1548	8
	1549	9 if (s == NULL) @{
	1550	10 c = 0xffffffffL;
	1551	11 @} else @{
	1552	12 c = crc;
	1553	13 if (n) do @{
	1554	14 c = crc_32_tab[...];
	1555	15 @} while (--n);
	1556	16 @}
	1557	17 crc = c;
	1558	18 return c ^ 0xffffffffL;
	1559	19 @}
	1560
	1561	@end smallexample
	1562
	1563	@code{updcrc} has at least five basic-blocks.
	1564	One is the function itself. The
	1565	@code{if} statement on line 9 generates two more basic-blocks, one
	1566	for each branch of the @code{if}. A fourth basic-block results from
	1567	the @code{if} on line 13, and the contents of the @code{do} loop form
	1568	the fifth basic-block. The compiler may also generate additional
	1569	basic-blocks to handle various special cases.
	1570
	1571	A program augmented for basic-block counting can be analyzed with
	1572	@samp{gprof -l -A}.
	1573	The @samp{-x} option is also helpful,
	1574	to ensure that each line of code is labeled at least once.
	1575	Here is @code{updcrc}'s
	1576	annotated source listing for a sample @code{gzip} run:
	1577
	1578	@smallexample
	1579	ulg updcrc(s, n)
	1580	uch *s;
	1581	unsigned n;
	1582	2 ->@{
	1583	register ulg c;
	1584
	1585	static ulg crc = (ulg)0xffffffffL;
	1586
	1587	2 -> if (s == NULL) @{
	1588	1 -> c = 0xffffffffL;
	1589	1 -> @} else @{
	1590	1 -> c = crc;
	1591	1 -> if (n) do @{
	1592	26312 -> c = crc_32_tab[...];
	1593	26312,1,26311 -> @} while (--n);
	1594	@}
	1595	2 -> crc = c;
	1596	2 -> return c ^ 0xffffffffL;
	1597	2 ->@}
	1598	@end smallexample
	1599
	1600	In this example, the function was called twice, passing once through
	1601	each branch of the @code{if} statement. The body of the @code{do}
	1602	loop was executed a total of 26312 times. Note how the @code{while}
	1603	statement is annotated. It began execution 26312 times, once for
	1604	each iteration through the loop. One of those times (the last time)
	1605	it exited, while it branched back to the beginning of the loop 26311 times.
	1606
	1607	@node Inaccuracy
	1608	@chapter Inaccuracy of @code{gprof} Output
	1609
	1610	@menu
	1611	* Sampling Error:: Statistical margins of error
	1612	* Assumptions:: Estimating children times
	1613	@end menu
	1614
	1615	@node Sampling Error
	1616	@section Statistical Sampling Error
	1617
	1618	The run-time figures that @code{gprof} gives you are based on a sampling
	1619	process, so they are subject to statistical inaccuracy. If a function runs
	1620	only a small amount of time, so that on the average the sampling process
	1621	ought to catch that function in the act only once, there is a pretty good
	1622	chance it will actually find that function zero times, or twice.
	1623
	1624	By contrast, the number-of-calls and basic-block figures are derived
	1625	by counting, not sampling. They are completely accurate and will not
	1626	vary from run to run if your program is deterministic and single
	1627	threaded. In multi-threaded applications, or single threaded
	1628	applications that link with multi-threaded libraries, the counts are
	1629	only deterministic if the counting function is thread-safe. (Note:
	1630	beware that the mcount counting function in glibc is @emph{not}
	1631	thread-safe). @xref{Implementation, ,Implementation of Profiling}.
	1632
	1633	The @dfn{sampling period} that is printed at the beginning of the flat
	1634	profile says how often samples are taken. The rule of thumb is that a
	1635	run-time figure is accurate if it is considerably bigger than the sampling
	1636	period.
	1637
	1638	The actual amount of error can be predicted.
	1639	For @var{n} samples, the @emph{expected} error
	1640	is the square-root of @var{n}. For example,
	1641	if the sampling period is 0.01 seconds and @code{foo}'s run-time is 1 second,
	1642	@var{n} is 100 samples (1 second/0.01 seconds), sqrt(@var{n}) is 10 samples, so
	1643	the expected error in @code{foo}'s run-time is 0.1 seconds (10*0.01 seconds),
	1644	or ten percent of the observed value.
	1645	Again, if the sampling period is 0.01 seconds and @code{bar}'s run-time is
	1646	100 seconds, @var{n} is 10000 samples, sqrt(@var{n}) is 100 samples, so
	1647	the expected error in @code{bar}'s run-time is 1 second,
	1648	or one percent of the observed value.
	1649	It is likely to
	1650	vary this much @emph{on the average} from one profiling run to the next.
	1651	(@emph{Sometimes} it will vary more.)
	1652
	1653	This does not mean that a small run-time figure is devoid of information.
	1654	If the program's @emph{total} run-time is large, a small run-time for one
	1655	function does tell you that that function used an insignificant fraction of
	1656	the whole program's time. Usually this means it is not worth optimizing.
	1657
	1658	One way to get more accuracy is to give your program more (but similar)
	1659	input data so it will take longer. Another way is to combine the data from
	1660	several runs, using the @samp{-s} option of @code{gprof}. Here is how:
	1661
	1662	@enumerate
	1663	@item
	1664	Run your program once.
	1665
	1666	@item
	1667	Issue the command @samp{mv gmon.out gmon.sum}.
	1668
	1669	@item
	1670	Run your program again, the same as before.
	1671
	1672	@item
	1673	Merge the new data in @file{gmon.out} into @file{gmon.sum} with this command:
	1674
	1675	@example
	1676	gprof -s @var{executable-file} gmon.out gmon.sum
	1677	@end example
	1678
	1679	@item
	1680	Repeat the last two steps as often as you wish.
	1681
	1682	@item
	1683	Analyze the cumulative data using this command:
	1684
	1685	@example
	1686	gprof @var{executable-file} gmon.sum > @var{output-file}
	1687	@end example
	1688	@end enumerate
	1689
	1690	@node Assumptions
	1691	@section Estimating @code{children} Times
	1692
	1693	Some of the figures in the call graph are estimates---for example, the
	1694	@code{children} time values and all the time figures in caller and
	1695	subroutine lines.
	1696
	1697	There is no direct information about these measurements in the profile
	1698	data itself. Instead, @code{gprof} estimates them by making an assumption
	1699	about your program that might or might not be true.
	1700
	1701	The assumption made is that the average time spent in each call to any
	1702	function @code{foo} is not correlated with who called @code{foo}. If
	1703	@code{foo} used 5 seconds in all, and 2/5 of the calls to @code{foo} came
	1704	from @code{a}, then @code{foo} contributes 2 seconds to @code{a}'s
	1705	@code{children} time, by assumption.
	1706
	1707	This assumption is usually true enough, but for some programs it is far
	1708	from true. Suppose that @code{foo} returns very quickly when its argument
	1709	is zero; suppose that @code{a} always passes zero as an argument, while
	1710	other callers of @code{foo} pass other arguments. In this program, all the
	1711	time spent in @code{foo} is in the calls from callers other than @code{a}.
	1712	But @code{gprof} has no way of knowing this; it will blindly and
	1713	incorrectly charge 2 seconds of time in @code{foo} to the children of
	1714	@code{a}.
	1715
	1716	@c FIXME - has this been fixed?
	1717	We hope some day to put more complete data into @file{gmon.out}, so that
	1718	this assumption is no longer needed, if we can figure out how. For the
	1719	novice, the estimated figures are usually more useful than misleading.
	1720
	1721	@node How do I?
	1722	@chapter Answers to Common Questions
	1723
	1724	@table @asis
	1725	@item How can I get more exact information about hot spots in my program?
	1726
	1727	Looking at the per-line call counts only tells part of the story.
	1728	Because @code{gprof} can only report call times and counts by function,
	1729	the best way to get finer-grained information on where the program
	1730	is spending its time is to re-factor large functions into sequences
	1731	of calls to smaller ones. Beware however that this can introduce
	1732	artificial hot spots since compiling with @samp{-pg} adds a significant
	1733	overhead to function calls. An alternative solution is to use a
	1734	non-intrusive profiler, e.g.@: oprofile.
	1735
	1736	@item How do I find which lines in my program were executed the most times?
	1737
	1738	Use the @code{gcov} program.
	1739
	1740	@item How do I find which lines in my program called a particular function?
	1741
	1742	Use @samp{gprof -l} and lookup the function in the call graph.
	1743	The callers will be broken down by function and line number.
	1744
	1745	@item How do I analyze a program that runs for less than a second?
	1746
	1747	Try using a shell script like this one:
	1748
	1749	@example
	1750	for i in `seq 1 100`; do
	1751	fastprog
	1752	mv gmon.out gmon.out.$i
	1753	done
	1754
	1755	gprof -s fastprog gmon.out.*
	1756
	1757	gprof fastprog gmon.sum
	1758	@end example
	1759
	1760	If your program is completely deterministic, all the call counts
	1761	will be simple multiples of 100 (i.e., a function called once in
	1762	each run will appear with a call count of 100).
	1763
	1764	@end table
	1765
	1766	@node Incompatibilities
	1767	@chapter Incompatibilities with Unix @code{gprof}
	1768
	1769	@sc{gnu} @code{gprof} and Berkeley Unix @code{gprof} use the same data
	1770	file @file{gmon.out}, and provide essentially the same information. But
	1771	there are a few differences.
	1772
	1773	@itemize @bullet
	1774	@item
	1775	@sc{gnu} @code{gprof} uses a new, generalized file format with support
	1776	for basic-block execution counts and non-realtime histograms. A magic
	1777	cookie and version number allows @code{gprof} to easily identify
	1778	new style files. Old BSD-style files can still be read.
	1779	@xref{File Format, ,Profiling Data File Format}.
	1780
	1781	@item
	1782	For a recursive function, Unix @code{gprof} lists the function as a
	1783	parent and as a child, with a @code{calls} field that lists the number
	1784	of recursive calls. @sc{gnu} @code{gprof} omits these lines and puts
	1785	the number of recursive calls in the primary line.
	1786
	1787	@item
	1788	When a function is suppressed from the call graph with @samp{-e}, @sc{gnu}
	1789	@code{gprof} still lists it as a subroutine of functions that call it.
	1790
	1791	@item
	1792	@sc{gnu} @code{gprof} accepts the @samp{-k} with its argument
	1793	in the form @samp{from/to}, instead of @samp{from to}.
	1794
	1795	@item
	1796	In the annotated source listing,
	1797	if there are multiple basic blocks on the same line,
	1798	@sc{gnu} @code{gprof} prints all of their counts, separated by commas.
	1799
	1800	@ignore - it does this now
	1801	@item
	1802	The function names printed in @sc{gnu} @code{gprof} output do not include
	1803	the leading underscores that are added internally to the front of all
	1804	C identifiers on many operating systems.
	1805	@end ignore
	1806
	1807	@item
	1808	The blurbs, field widths, and output formats are different. @sc{gnu}
	1809	@code{gprof} prints blurbs after the tables, so that you can see the
	1810	tables without skipping the blurbs.
	1811	@end itemize
	1812
	1813	@node Details
	1814	@chapter Details of Profiling
	1815
	1816	@menu
	1817	* Implementation:: How a program collects profiling information
	1818	* File Format:: Format of @samp{gmon.out} files
	1819	* Internals:: @code{gprof}'s internal operation
	1820	* Debugging:: Using @code{gprof}'s @samp{-d} option
	1821	@end menu
	1822
	1823	@node Implementation
	1824	@section Implementation of Profiling
	1825
	1826	Profiling works by changing how every function in your program is compiled
	1827	so that when it is called, it will stash away some information about where
	1828	it was called from. From this, the profiler can figure out what function
	1829	called it, and can count how many times it was called. This change is made
	1830	by the compiler when your program is compiled with the @samp{-pg} option,
	1831	which causes every function to call @code{mcount}
	1832	(or @code{_mcount}, or @code{__mcount}, depending on the OS and compiler)
	1833	as one of its first operations.
	1834
	1835	The @code{mcount} routine, included in the profiling library,
	1836	is responsible for recording in an in-memory call graph table
	1837	both its parent routine (the child) and its parent's parent. This is
	1838	typically done by examining the stack frame to find both
	1839	the address of the child, and the return address in the original parent.
	1840	Since this is a very machine-dependent operation, @code{mcount}
	1841	itself is typically a short assembly-language stub routine
	1842	that extracts the required
	1843	information, and then calls @code{__mcount_internal}
	1844	(a normal C function) with two arguments---@code{frompc} and @code{selfpc}.
	1845	@code{__mcount_internal} is responsible for maintaining
	1846	the in-memory call graph, which records @code{frompc}, @code{selfpc},
	1847	and the number of times each of these call arcs was traversed.
	1848
	1849	GCC Version 2 provides a magical function (@code{__builtin_return_address}),
	1850	which allows a generic @code{mcount} function to extract the
	1851	required information from the stack frame. However, on some
	1852	architectures, most notably the SPARC, using this builtin can be
	1853	very computationally expensive, and an assembly language version
	1854	of @code{mcount} is used for performance reasons.
	1855
	1856	Number-of-calls information for library routines is collected by using a
	1857	special version of the C library. The programs in it are the same as in
	1858	the usual C library, but they were compiled with @samp{-pg}. If you
	1859	link your program with @samp{gcc @dots{} -pg}, it automatically uses the
	1860	profiling version of the library.
	1861
	1862	Profiling also involves watching your program as it runs, and keeping a
	1863	histogram of where the program counter happens to be every now and then.
	1864	Typically the program counter is looked at around 100 times per second of
	1865	run time, but the exact frequency may vary from system to system.
	1866
	1867	This is done is one of two ways. Most UNIX-like operating systems
	1868	provide a @code{profil()} system call, which registers a memory
	1869	array with the kernel, along with a scale
	1870	factor that determines how the program's address space maps
	1871	into the array.
	1872	Typical scaling values cause every 2 to 8 bytes of address space
	1873	to map into a single array slot.
	1874	On every tick of the system clock
	1875	(assuming the profiled program is running), the value of the
	1876	program counter is examined and the corresponding slot in
	1877	the memory array is incremented. Since this is done in the kernel,
	1878	which had to interrupt the process anyway to handle the clock
	1879	interrupt, very little additional system overhead is required.
	1880
	1881	However, some operating systems, most notably Linux 2.0 (and earlier),
	1882	do not provide a @code{profil()} system call. On such a system,
	1883	arrangements are made for the kernel to periodically deliver
	1884	a signal to the process (typically via @code{setitimer()}),
	1885	which then performs the same operation of examining the
	1886	program counter and incrementing a slot in the memory array.
	1887	Since this method requires a signal to be delivered to
	1888	user space every time a sample is taken, it uses considerably
	1889	more overhead than kernel-based profiling. Also, due to the
	1890	added delay required to deliver the signal, this method is
	1891	less accurate as well.
	1892
	1893	A special startup routine allocates memory for the histogram and
	1894	either calls @code{profil()} or sets up
	1895	a clock signal handler.
	1896	This routine (@code{monstartup}) can be invoked in several ways.
	1897	On Linux systems, a special profiling startup file @code{gcrt0.o},
	1898	which invokes @code{monstartup} before @code{main},
	1899	is used instead of the default @code{crt0.o}.
	1900	Use of this special startup file is one of the effects
	1901	of using @samp{gcc @dots{} -pg} to link.
	1902	On SPARC systems, no special startup files are used.
	1903	Rather, the @code{mcount} routine, when it is invoked for
	1904	the first time (typically when @code{main} is called),
	1905	calls @code{monstartup}.
	1906
	1907	If the compiler's @samp{-a} option was used, basic-block counting
	1908	is also enabled. Each object file is then compiled with a static array
	1909	of counts, initially zero.
	1910	In the executable code, every time a new basic-block begins
	1911	(i.e., when an @code{if} statement appears), an extra instruction
	1912	is inserted to increment the corresponding count in the array.
	1913	At compile time, a paired array was constructed that recorded
	1914	the starting address of each basic-block. Taken together,
	1915	the two arrays record the starting address of every basic-block,
	1916	along with the number of times it was executed.
	1917
	1918	The profiling library also includes a function (@code{mcleanup}) which is
	1919	typically registered using @code{atexit()} to be called as the
	1920	program exits, and is responsible for writing the file @file{gmon.out}.
	1921	Profiling is turned off, various headers are output, and the histogram
	1922	is written, followed by the call-graph arcs and the basic-block counts.
	1923
	1924	The output from @code{gprof} gives no indication of parts of your program that
	1925	are limited by I/O or swapping bandwidth. This is because samples of the
	1926	program counter are taken at fixed intervals of the program's run time.
	1927	Therefore, the
	1928	time measurements in @code{gprof} output say nothing about time that your
	1929	program was not running. For example, a part of the program that creates
	1930	so much data that it cannot all fit in physical memory at once may run very
	1931	slowly due to thrashing, but @code{gprof} will say it uses little time. On
	1932	the other hand, sampling by run time has the advantage that the amount of
	1933	load due to other users won't directly affect the output you get.
	1934
	1935	@node File Format
	1936	@section Profiling Data File Format
	1937
	1938	The old BSD-derived file format used for profile data does not contain a
	1939	magic cookie that allows one to check whether a data file really is a
	1940	@code{gprof} file. Furthermore, it does not provide a version number, thus
	1941	rendering changes to the file format almost impossible. @sc{gnu} @code{gprof}
	1942	uses a new file format that provides these features. For backward
	1943	compatibility, @sc{gnu} @code{gprof} continues to support the old BSD-derived
	1944	format, but not all features are supported with it. For example,
	1945	basic-block execution counts cannot be accommodated by the old file
	1946	format.
	1947
	1948	The new file format is defined in header file @file{gmon_out.h}. It
	1949	consists of a header containing the magic cookie and a version number,
	1950	as well as some spare bytes available for future extensions. All data
	1951	in a profile data file is in the native format of the target for which
	1952	the profile was collected. @sc{gnu} @code{gprof} adapts automatically
	1953	to the byte-order in use.
	1954
	1955	In the new file format, the header is followed by a sequence of
	1956	records. Currently, there are three different record types: histogram
	1957	records, call-graph arc records, and basic-block execution count
	1958	records. Each file can contain any number of each record type. When
	1959	reading a file, @sc{gnu} @code{gprof} will ensure records of the same type are
	1960	compatible with each other and compute the union of all records. For
	1961	example, for basic-block execution counts, the union is simply the sum
	1962	of all execution counts for each basic-block.
	1963
	1964	@subsection Histogram Records
	1965
	1966	Histogram records consist of a header that is followed by an array of
	1967	bins. The header contains the text-segment range that the histogram
	1968	spans, the size of the histogram in bytes (unlike in the old BSD
	1969	format, this does not include the size of the header), the rate of the
	1970	profiling clock, and the physical dimension that the bin counts
	1971	represent after being scaled by the profiling clock rate. The
	1972	physical dimension is specified in two parts: a long name of up to 15
	1973	characters and a single character abbreviation. For example, a
	1974	histogram representing real-time would specify the long name as
	1975	``seconds'' and the abbreviation as ``s''. This feature is useful for
	1976	architectures that support performance monitor hardware (which,
	1977	fortunately, is becoming increasingly common). For example, under DEC
	1978	OSF/1, the ``uprofile'' command can be used to produce a histogram of,
	1979	say, instruction cache misses. In this case, the dimension in the
	1980	histogram header could be set to ``i-cache misses'' and the abbreviation
	1981	could be set to ``1'' (because it is simply a count, not a physical
	1982	dimension). Also, the profiling rate would have to be set to 1 in
	1983	this case.
	1984
	1985	Histogram bins are 16-bit numbers and each bin represent an equal
	1986	amount of text-space. For example, if the text-segment is one
	1987	thousand bytes long and if there are ten bins in the histogram, each
	1988	bin represents one hundred bytes.
	1989
	1990
	1991	@subsection Call-Graph Records
	1992
	1993	Call-graph records have a format that is identical to the one used in
	1994	the BSD-derived file format. It consists of an arc in the call graph
	1995	and a count indicating the number of times the arc was traversed
	1996	during program execution. Arcs are specified by a pair of addresses:
	1997	the first must be within caller's function and the second must be
	1998	within the callee's function. When performing profiling at the
	1999	function level, these addresses can point anywhere within the
	2000	respective function. However, when profiling at the line-level, it is
	2001	better if the addresses are as close to the call-site/entry-point as
	2002	possible. This will ensure that the line-level call-graph is able to
	2003	identify exactly which line of source code performed calls to a
	2004	function.
	2005
	2006	@subsection Basic-Block Execution Count Records
	2007
	2008	Basic-block execution count records consist of a header followed by a
	2009	sequence of address/count pairs. The header simply specifies the
	2010	length of the sequence. In an address/count pair, the address
	2011	identifies a basic-block and the count specifies the number of times
	2012	that basic-block was executed. Any address within the basic-address can
	2013	be used.
	2014
	2015	@node Internals
	2016	@section @code{gprof}'s Internal Operation
	2017
	2018	Like most programs, @code{gprof} begins by processing its options.
	2019	During this stage, it may building its symspec list
	2020	(@code{sym_ids.c:@-sym_id_add}), if
	2021	options are specified which use symspecs.
	2022	@code{gprof} maintains a single linked list of symspecs,
	2023	which will eventually get turned into 12 symbol tables,
	2024	organized into six include/exclude pairs---one
	2025	pair each for the flat profile (INCL_FLAT/EXCL_FLAT),
	2026	the call graph arcs (INCL_ARCS/EXCL_ARCS),
	2027	printing in the call graph (INCL_GRAPH/EXCL_GRAPH),
	2028	timing propagation in the call graph (INCL_TIME/EXCL_TIME),
	2029	the annotated source listing (INCL_ANNO/EXCL_ANNO),
	2030	and the execution count listing (INCL_EXEC/EXCL_EXEC).
	2031
	2032	After option processing, @code{gprof} finishes
	2033	building the symspec list by adding all the symspecs in
	2034	@code{default_excluded_list} to the exclude lists
	2035	EXCL_TIME and EXCL_GRAPH, and if line-by-line profiling is specified,
	2036	EXCL_FLAT as well.
	2037	These default excludes are not added to EXCL_ANNO, EXCL_ARCS, and EXCL_EXEC.
	2038
	2039	Next, the BFD library is called to open the object file,
	2040	verify that it is an object file,
	2041	and read its symbol table (@code{core.c:@-core_init}),
	2042	using @code{bfd_canonicalize_symtab} after mallocing
	2043	an appropriately sized array of symbols. At this point,
	2044	function mappings are read (if the @samp{--file-ordering} option
	2045	has been specified), and the core text space is read into
	2046	memory (if the @samp{-c} option was given).
	2047
	2048	@code{gprof}'s own symbol table, an array of Sym structures,
	2049	is now built.
	2050	This is done in one of two ways, by one of two routines, depending
	2051	on whether line-by-line profiling (@samp{-l} option) has been
	2052	enabled.
	2053	For normal profiling, the BFD canonical symbol table is scanned.
	2054	For line-by-line profiling, every
	2055	text space address is examined, and a new symbol table entry
	2056	gets created every time the line number changes.
	2057	In either case, two passes are made through the symbol
	2058	table---one to count the size of the symbol table required,
	2059	and the other to actually read the symbols. In between the
	2060	two passes, a single array of type @code{Sym} is created of
	2061	the appropriate length.
	2062	Finally, @code{symtab.c:@-symtab_finalize}
	2063	is called to sort the symbol table and remove duplicate entries
	2064	(entries with the same memory address).
	2065
	2066	The symbol table must be a contiguous array for two reasons.
	2067	First, the @code{qsort} library function (which sorts an array)
	2068	will be used to sort the symbol table.
	2069	Also, the symbol lookup routine (@code{symtab.c:@-sym_lookup}),
	2070	which finds symbols
	2071	based on memory address, uses a binary search algorithm
	2072	which requires the symbol table to be a sorted array.
	2073	Function symbols are indicated with an @code{is_func} flag.
	2074	Line number symbols have no special flags set.
	2075	Additionally, a symbol can have an @code{is_static} flag
	2076	to indicate that it is a local symbol.
	2077
	2078	With the symbol table read, the symspecs can now be translated
	2079	into Syms (@code{sym_ids.c:@-sym_id_parse}). Remember that a single
	2080	symspec can match multiple symbols.
	2081	An array of symbol tables
	2082	(@code{syms}) is created, each entry of which is a symbol table
	2083	of Syms to be included or excluded from a particular listing.
	2084	The master symbol table and the symspecs are examined by nested
	2085	loops, and every symbol that matches a symspec is inserted
	2086	into the appropriate syms table. This is done twice, once to
	2087	count the size of each required symbol table, and again to build
	2088	the tables, which have been malloced between passes.
	2089	From now on, to determine whether a symbol is on an include
	2090	or exclude symspec list, @code{gprof} simply uses its
	2091	standard symbol lookup routine on the appropriate table
	2092	in the @code{syms} array.
	2093
	2094	Now the profile data file(s) themselves are read
	2095	(@code{gmon_io.c:@-gmon_out_read}),
	2096	first by checking for a new-style @samp{gmon.out} header,
	2097	then assuming this is an old-style BSD @samp{gmon.out}
	2098	if the magic number test failed.
	2099
	2100	New-style histogram records are read by @code{hist.c:@-hist_read_rec}.
	2101	For the first histogram record, allocate a memory array to hold
	2102	all the bins, and read them in.
	2103	When multiple profile data files (or files with multiple histogram
	2104	records) are read, the memory ranges of each pair of histogram records
	2105	must be either equal, or non-overlapping. For each pair of histogram
	2106	records, the resolution (memory region size divided by the number of
	2107	bins) must be the same. The time unit must be the same for all
	2108	histogram records. If the above containts are met, all histograms
	2109	for the same memory range are merged.
	2110
	2111	As each call graph record is read (@code{call_graph.c:@-cg_read_rec}),
	2112	the parent and child addresses
	2113	are matched to symbol table entries, and a call graph arc is
	2114	created by @code{cg_arcs.c:@-arc_add}, unless the arc fails a symspec
	2115	check against INCL_ARCS/EXCL_ARCS. As each arc is added,
	2116	a linked list is maintained of the parent's child arcs, and of the child's
	2117	parent arcs.
	2118	Both the child's call count and the arc's call count are
	2119	incremented by the record's call count.
	2120
	2121	Basic-block records are read (@code{basic_blocks.c:@-bb_read_rec}),
	2122	but only if line-by-line profiling has been selected.
	2123	Each basic-block address is matched to a corresponding line
	2124	symbol in the symbol table, and an entry made in the symbol's
	2125	bb_addr and bb_calls arrays. Again, if multiple basic-block
	2126	records are present for the same address, the call counts
	2127	are cumulative.
	2128
	2129	A gmon.sum file is dumped, if requested (@code{gmon_io.c:@-gmon_out_write}).
	2130
	2131	If histograms were present in the data files, assign them to symbols
	2132	(@code{hist.c:@-hist_assign_samples}) by iterating over all the sample
	2133	bins and assigning them to symbols. Since the symbol table
	2134	is sorted in order of ascending memory addresses, we can
	2135	simple follow along in the symbol table as we make our pass
	2136	over the sample bins.
	2137	This step includes a symspec check against INCL_FLAT/EXCL_FLAT.
	2138	Depending on the histogram
	2139	scale factor, a sample bin may span multiple symbols,
	2140	in which case a fraction of the sample count is allocated
	2141	to each symbol, proportional to the degree of overlap.
	2142	This effect is rare for normal profiling, but overlaps
	2143	are more common during line-by-line profiling, and can
	2144	cause each of two adjacent lines to be credited with half
	2145	a hit, for example.
	2146
	2147	If call graph data is present, @code{cg_arcs.c:@-cg_assemble} is called.
	2148	First, if @samp{-c} was specified, a machine-dependent
	2149	routine (@code{find_call}) scans through each symbol's machine code,
	2150	looking for subroutine call instructions, and adding them
	2151	to the call graph with a zero call count.
	2152	A topological sort is performed by depth-first numbering
	2153	all the symbols (@code{cg_dfn.c:@-cg_dfn}), so that
	2154	children are always numbered less than their parents,
	2155	then making a array of pointers into the symbol table and sorting it into
	2156	numerical order, which is reverse topological
	2157	order (children appear before parents).
	2158	Cycles are also detected at this point, all members
	2159	of which are assigned the same topological number.
	2160	Two passes are now made through this sorted array of symbol pointers.
	2161	The first pass, from end to beginning (parents to children),
	2162	computes the fraction of child time to propagate to each parent
	2163	and a print flag.
	2164	The print flag reflects symspec handling of INCL_GRAPH/EXCL_GRAPH,
	2165	with a parent's include or exclude (print or no print) property
	2166	being propagated to its children, unless they themselves explicitly appear
	2167	in INCL_GRAPH or EXCL_GRAPH.
	2168	A second pass, from beginning to end (children to parents) actually
	2169	propagates the timings along the call graph, subject
	2170	to a check against INCL_TIME/EXCL_TIME.
	2171	With the print flag, fractions, and timings now stored in the symbol
	2172	structures, the topological sort array is now discarded, and a
	2173	new array of pointers is assembled, this time sorted by propagated time.
	2174
	2175	Finally, print the various outputs the user requested, which is now fairly
	2176	straightforward. The call graph (@code{cg_print.c:@-cg_print}) and
	2177	flat profile (@code{hist.c:@-hist_print}) are regurgitations of values
	2178	already computed. The annotated source listing
	2179	(@code{basic_blocks.c:@-print_annotated_source}) uses basic-block
	2180	information, if present, to label each line of code with call counts,
	2181	otherwise only the function call counts are presented.
	2182
	2183	The function ordering code is marginally well documented
	2184	in the source code itself (@code{cg_print.c}). Basically,
	2185	the functions with the most use and the most parents are
	2186	placed first, followed by other functions with the most use,
	2187	followed by lower use functions, followed by unused functions
	2188	at the end.
	2189
	2190	@node Debugging
	2191	@section Debugging @code{gprof}
	2192
	2193	If @code{gprof} was compiled with debugging enabled,
	2194	the @samp{-d} option triggers debugging output
	2195	(to stdout) which can be helpful in understanding its operation.
	2196	The debugging number specified is interpreted as a sum of the following
	2197	options:
	2198
	2199	@table @asis
	2200	@item 2 - Topological sort
	2201	Monitor depth-first numbering of symbols during call graph analysis
	2202	@item 4 - Cycles
	2203	Shows symbols as they are identified as cycle heads
	2204	@item 16 - Tallying
	2205	As the call graph arcs are read, show each arc and how
	2206	the total calls to each function are tallied
	2207	@item 32 - Call graph arc sorting
	2208	Details sorting individual parents/children within each call graph entry
	2209	@item 64 - Reading histogram and call graph records
	2210	Shows address ranges of histograms as they are read, and each
	2211	call graph arc
	2212	@item 128 - Symbol table
	2213	Reading, classifying, and sorting the symbol table from the object file.
	2214	For line-by-line profiling (@samp{-l} option), also shows line numbers
	2215	being assigned to memory addresses.
	2216	@item 256 - Static call graph
	2217	Trace operation of @samp{-c} option
	2218	@item 512 - Symbol table and arc table lookups
	2219	Detail operation of lookup routines
	2220	@item 1024 - Call graph propagation
	2221	Shows how function times are propagated along the call graph
	2222	@item 2048 - Basic-blocks
	2223	Shows basic-block records as they are read from profile data
	2224	(only meaningful with @samp{-l} option)
	2225	@item 4096 - Symspecs
	2226	Shows symspec-to-symbol pattern matching operation
	2227	@item 8192 - Annotate source
	2228	Tracks operation of @samp{-A} option
	2229	@end table
	2230
	2231	@node GNU Free Documentation License
	2232	@appendix GNU Free Documentation License
	2233	@include fdl.texi
	2234
	2235	@bye
	2236
	2237	NEEDS AN INDEX
	2238
	2239	-T - "traditional BSD style": How is it different? Should the
	2240	differences be documented?
	2241
	2242	example flat file adds up to 100.01%...
	2243
	2244	note: time estimates now only go out to one decimal place (0.0), where
	2245	they used to extend two (78.67).