work with core dumps from remote site
18 July 2011
One thing that happens to C/C** programs is that they crash at customer sites, usually a Core dump is left after the crash .
Now it happens that core dump is sent to you from a remote customer site, and joy and wonder, you can’t get a meaningful stack trace out of it. (Usually that is the point to start loud chanting of the Segmentation violation - Core dumped blues
Now
-
One reason may be that the stack is smashed in this case you can’t get any information, sorry. (well we have a solution right here , but that’s a different story …)
-
Another reason is that you have compiled the program with Optimization and frame pointer omission enabled, in this case you won’t see all information either. Well one options is not to do that ( add -fno-frame-pointer-omission) . Well actually Yossi Kreinin, with a very clever trick
-
The third one (very likely) is that the crash occurs in a shared library (libc,pthread, oracle client, etc), and it can’t be traced, because your system has a slightly different versions of this shared library as compared to what you have locally on your machine. Now this situation can be solved, so now it’s scripting time (Perl to the rescue)
-
Another scenario is when a system / third party shared library calls you back, In this case you are bound to see a different stack on your machine, of third party library is different from what you have. An example is C** stack unwinding; when an exception is caught, then GLIBC is calling your code, it is calling object destructors of stack based objects.
The script does the following:
-
Get path of all dependent shared libraries (parse result of ldd
) - Copy shared libraries , the core file, the binary file and make an archive out of this
- Create a shell script that instructs GDB to use the copied shared libraries instead of what you got on your system. (here goes the trick)
Now here is the script that does it all.
how to use it
./gather-libs.pl -c core.12345 -e exe_file -o out.tbz2
Creates the archive out.tbz2 that includes
-
Core file core.12345
- Executable exe_file and all its dependent shared libraries
- The script run.sh that invokes gdb with the copied shared libraries
Now once you have copied archive out.tbz2 over to your machine you can open it
tar xvfj out.bz2
Now script run.sh will first check if the executable has any debug information; It happens that applications strip debug information before they ship to a customer, so it is better to check if that happened or not.
If executable has debug information then gdb is run and told to look at the copied shared libraries instead what is installed on your system. To run with a different core file
./run.sh -c core.2345
To run with a different executable and core
,/run.sh -c core.2345 -e another-exe:
How to tell gdb to look at different set of share libraries.
Gdb has the following command
set solib-absolute-path
It tells gdb to look for shared libraries from under
set solib-absolute-prefix .
core-file <core file name>
Then it starts gdb with this generated script (-x option of gdb). Now doing a script that writes another script is an exercise in magic (small magic alas).
Limitations
As always, there are situation when this script does not work properly; For example if the application loads shared libraries dynamically with dlopen / dlsym, then those will not be copied into the archive. (Please tell me that this does not happen often on Linux; Sorry).
Interesting core dump trivia
Once upon a time we got a core file that was huge - about three gigabytes large. Can it be understood from the size of the core dump that the application has run out of memory? Well the answer is a bit tricky, Let’s remember a few things about Virtual memory - there is the ‘Virtual memory size’ - for a given process these are all virtual address ranges summed up; it is the number of all reserved virtual addresses; these addresses may or may not be backed up with real memory. Then there is the ‘Resident set size’ - the number of all physical memory pages that are in use by a process multiplied by page size; (a memory page Is a fixed size unit that backs up a range of virtual memory / is the real memory behind a range of virtual addresses; a page is typically 4096 or 8192 bytes long). The question is now, can the ‘Resident set size’ be extracted from the core dump? Now a core dump is just an ELF format binary file with a special section called “note” - information about the core dump, registers at the time of the crash, etc. The core dump is created by the Linux kernel; so let’s search for the function that is writing the core dump, the Linux cross reference project can be used to search the kernel sources, searches here are faster than grep !
-
First the ELF binary format is described in header include/linux/elf.h ;
- “structure elf32_note”: http://lxr.linux.no/#linux+v3.0.4/include/linux/elf.h#L402 is defined here, it describes the note header that describes the core file. So we should search for platform independent elf_note
- Now function elf_core_dump seems to be writing the core dump.
The function first writes the executable header and note sections (lines 1903 - 1978) ; then an elf section header is written for each virtual memory address range (1981- 2003) then at line 2019 it starts to get interesting , it writes the actual memory pages into the note ext section. Line 2019 loops on all virtual memory ranges; for each range the nested loop at line 2026 loops over all pages that make up a memory range. So line 2030 gets the content of a memory page, if the page is available and backed up by physical memory then it is written into the file (2034), if not then line 2039 just extends the file by the size of one range and puts zeroes into it.
So know we know that the core file is equal to the size of the virtual memory. In principle one could search for consecutive zero ranges, but then it is impossible to tell if a range of zeros is a range of virtual addresses that is not backed by physical memory, or a page of physical memory that is all zero.
The loop that writes the memory ranges.
2019 for (vma = first_vma(current, gate_vma); vma != NULL;
2020 vma = next_vma(vma, gate_vma)) {
2021 unsigned long addr;
2022 unsigned long end;
2023
2024 end = vma->vm_start + vma_dump_size(vma, cprm->mm_flags);
2025
2026 for (addr = vma->vm_start; addr < end; addr += PAGE_SIZE) {
2027 struct page *page;
2028 int stop;
2029
2030 page = get_dump_page(addr);
2031 if (page) {
2032 void *kaddr = kmap(page);
2033 stop = ((size += PAGE_SIZE) > cprm->limit) ||
2034 !dump_write(cprm->file, kaddr,
2035 PAGE_SIZE);
2036 kunmap(page);
2037 page_cache_release(page);
2038 } else
2039 stop = !dump_seek(cprm->file, PAGE_SIZE);
2040 if (stop)
2041 goto end_coredump;
2042 }
2043 }
The big function
1868/*
1869 * Actual dumper
1870 *
1871 * This is a two-pass process; first we find the offsets of the bits,
1872 * and then they are actually written out. If we run out of core limit
1873 * we just truncate.
1874 */
1875static int elf_core_dump(struct coredump_params *cprm)
1876{
1877 int has_dumped = 0;
1878 mm_segment_t fs;
1879 int segs;
1880 size_t size = 0;
1881 struct vm_area_struct *vma, *gate_vma;
1882 struct elfhdr *elf = NULL;
1883 loff_t offset = 0, dataoff, foffset;
1884 struct elf_note_info info;
1885 struct elf_phdr *phdr4note = NULL;
1886 struct elf_shdr *shdr4extnum = NULL;
1887 Elf_Half e_phnum;
1888 elf_addr_t e_shoff;
1889
1890 /*
1891 * We no longer stop all VM operations.
1892 *
1893 * This is because those proceses that could possibly change map_count
1894 * or the mmap / vma pages are now blocked in do_exit on current
1895 * finishing this core dump.
1896 *
1897 * Only ptrace can touch these memory addresses, but it doesn't change
1898 * the map_count or the pages allocated. So no possibility of crashing
1899 * exists while dumping the mm->vm_next areas to the core file.
1900 */
1901
1902 /* alloc memory for large data structures: too large to be on stack */
1903 elf = kmalloc(sizeof(*elf), GFP_KERNEL);
1904 if (!elf)
1905 goto out;
1906 /*
1907 * The number of segs are recored into ELF header as 16bit value.
1908 * Please check DEFAULT_MAX_MAP_COUNT definition when you modify here.
1909 */
1910 segs = current->mm->map_count;
1911 segs += elf_core_extra_phdrs();
1912
1913 gate_vma = get_gate_vma(current->mm);
1914 if (gate_vma != NULL)
1915 segs++;
1916
1917 /* for notes section */
1918 segs++;
1919
1920 /* If segs > PN_XNUM(0xffff), then e_phnum overflows. To avoid
1921 * this, kernel supports extended numbering. Have a look at
1922 * include/linux/elf.h for further information. */
1923 e_phnum = segs > PN_XNUM ? PN_XNUM : segs;
1924
1925 /*
1926 * Collect all the non-memory information about the process for the
1927 * notes. This also sets up the file header.
1928 */
1929 if (!fill_note_info(elf, e_phnum, &info, cprm->signr, cprm->regs))
1930 goto cleanup;
1931
1932 has_dumped = 1;
1933 current->flags |= PF_DUMPCORE;
1934
1935 fs = get_fs();
1936 set_fs(KERNEL_DS);
1937
1938 offset += sizeof(*elf); /* Elf header */
1939 offset += segs * sizeof(struct elf_phdr); /* Program headers */
1940 foffset = offset;
1941
1942 /* Write notes phdr entry */
1943 {
1944 size_t sz = get_note_info_size(&info);
1945
1946 sz += elf_coredump_extra_notes_size();
1947
1948 phdr4note = kmalloc(sizeof(*phdr4note), GFP_KERNEL);
1949 if (!phdr4note)
1950 goto end_coredump;
1951
1952 fill_elf_note_phdr(phdr4note, sz, offset);
1953 offset += sz;
1954 }
1955
1956 dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);
1957
1958 offset += elf_core_vma_data_size(gate_vma, cprm->mm_flags);
1959 offset += elf_core_extra_data_size();
1960 e_shoff = offset;
1961
1962 if (e_phnum == PN_XNUM) {
1963 shdr4extnum = kmalloc(sizeof(*shdr4extnum), GFP_KERNEL);
1964 if (!shdr4extnum)
1965 goto end_coredump;
1966 fill_extnum_info(elf, shdr4extnum, e_shoff, segs);
1967 }
1968
1969 offset = dataoff;
1970
1971 size += sizeof(*elf);
1972 if (size > cprm->limit || !dump_write(cprm->file, elf, sizeof(*elf)))
1973 goto end_coredump;
1974
1975 size += sizeof(*phdr4note);
1976 if (size > cprm->limit
1977 || !dump_write(cprm->file, phdr4note, sizeof(*phdr4note)))
1978 goto end_coredump;
1979
1980 /* Write program headers for segments dump */
1981 for (vma = first_vma(current, gate_vma); vma != NULL;
1982 vma = next_vma(vma, gate_vma)) {
1983 struct elf_phdr phdr;
1984
1985 phdr.p_type = PT_LOAD;
1986 phdr.p_offset = offset;
1987 phdr.p_vaddr = vma->vm_start;
1988 phdr.p_paddr = 0;
1989 phdr.p_filesz = vma_dump_size(vma, cprm->mm_flags);
1990 phdr.p_memsz = vma->vm_end - vma->vm_start;
1991 offset += phdr.p_filesz;
1992 phdr.p_flags = vma->vm_flags & VM_READ ? PF_R : 0;
1993 if (vma->vm_flags & VM_WRITE)
1994 phdr.p_flags |= PF_W;
1995 if (vma->vm_flags & VM_EXEC)
1996 phdr.p_flags |= PF_X;
1997 phdr.p_align = ELF_EXEC_PAGESIZE;
1998
1999 size += sizeof(phdr);
2000 if (size > cprm->limit
2001 || !dump_write(cprm->file, &phdr, sizeof(phdr)))
2002 goto end_coredump;
2003 }
2004
2005 if (!elf_core_write_extra_phdrs(cprm->file, offset, &size, cprm->limit))
2006 goto end_coredump;
2007
2008 /* write out the notes section */
2009 if (!write_note_info(&info, cprm->file, &foffset))
2010 goto end_coredump;
2011
2012 if (elf_coredump_extra_notes_write(cprm->file, &foffset))
2013 goto end_coredump;
2014
2015 /* Align to page */
2016 if (!dump_seek(cprm->file, dataoff - foffset))
2017 goto end_coredump;
2018
2019 for (vma = first_vma(current, gate_vma); vma != NULL;
2020 vma = next_vma(vma, gate_vma)) {
2021 unsigned long addr;
2022 unsigned long end;
2023
2024 end = vma->vm_start + vma_dump_size(vma, cprm->mm_flags);
2025
2026 for (addr = vma->vm_start; addr < end; addr += PAGE_SIZE) {
2027 struct page *page;
2028 int stop;
2029
2030 page = get_dump_page(addr);
2031 if (page) {
2032 void *kaddr = kmap(page);
2033 stop = ((size += PAGE_SIZE) > cprm->limit) ||
2034 !dump_write(cprm->file, kaddr,
2035 PAGE_SIZE);
2036 kunmap(page);
2037 page_cache_release(page);
2038 } else
2039 stop = !dump_seek(cprm->file, PAGE_SIZE);
2040 if (stop)
2041 goto end_coredump;
2042 }
2043 }
2044
2045 if (!elf_core_write_extra_data(cprm->file, &size, cprm->limit))
2046 goto end_coredump;
2047
2048 if (e_phnum == PN_XNUM) {
2049 size += sizeof(*shdr4extnum);
2050 if (size > cprm->limit
2051 || !dump_write(cprm->file, shdr4extnum,
2052 sizeof(*shdr4extnum)))
2053 goto end_coredump;
2054 }
2055
2056end_coredump:
2057 set_fs(fs);
2058
2059cleanup:
2060 free_note_info(&info);
2061 kfree(shdr4extnum);
2062 kfree(phdr4note);
2063 kfree(elf);
2064out:
2065 return has_dumped;
2066}
2067