Skip to content

Commit

Permalink
[LTO] Remove module id from summary index
Browse files Browse the repository at this point in the history
The module paths string table mapped to both an id sequentially assigned
during LTO linking, and the module hash. The former is leftover from
before the module hash was added for caching and subsequently replaced
use of the module id when renaming promoted symbols (to avoid affects
due to link order changes). The sequentially assigned module id was not
removed, however, as it was still a convenience when serializing to/from
bitcode and assembly.

This patch removes the module id from this table, since it isn't
strictly needed and can lead to confusion on when it is appropriate to
use (e.g. see fix in D156525). It also takes a (likely not significant)
amount of overhead. Where an integer module id is needed (e.g. bitcode
writing), one is assigned on the fly.

There are a couple of test changes since the paths are now sorted
alphanumerically when assigning ids on the fly during assembly writing,
in order to ensure deterministic behavior.

Differential Revision: https://reviews.llvm.org/D156730
  • Loading branch information
teresajohnson committed Sep 1, 2023
1 parent b0b3f82 commit bbe8cd1
Show file tree
Hide file tree
Showing 14 changed files with 110 additions and 103 deletions.
4 changes: 2 additions & 2 deletions lld/test/COFF/thinlto-index-only.ll
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@
; RUN: llvm-ar rcsT %t5.lib %t/bar.obj %t3.obj
; RUN: lld-link -thinlto-index-only -entry:main %t/foo.obj %t5.lib
; RUN: llvm-dis -o - %t/foo.obj.thinlto.bc | FileCheck %s --check-prefix=THINARCHIVE
; THINARCHIVE: ^0 = module: (path: "{{.*}}foo.obj",
; THINARCHIVE: ^1 = module: (path: "{{.*}}bar.obj",
; THINARCHIVE: ^0 = module: (path: "{{.*}}bar.obj",
; THINARCHIVE: ^1 = module: (path: "{{.*}}foo.obj",

target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc19.0.24215"
Expand Down
4 changes: 1 addition & 3 deletions llvm/include/llvm/Bitcode/BitcodeReader.h
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,6 @@ struct ParserCallbacks {
/// into CombinedIndex.
Error
readSummary(ModuleSummaryIndex &CombinedIndex, StringRef ModulePath,
uint64_t ModuleId,
std::function<bool(GlobalValue::GUID)> IsPrevailing = nullptr);
};

Expand Down Expand Up @@ -225,8 +224,7 @@ struct ParserCallbacks {

/// Parse the specified bitcode buffer and merge the index into CombinedIndex.
Error readModuleSummaryIndex(MemoryBufferRef Buffer,
ModuleSummaryIndex &CombinedIndex,
uint64_t ModuleId);
ModuleSummaryIndex &CombinedIndex);

/// Parse the module summary index out of an IR file and return the module
/// summary index object if found, or an empty summary if not. If Path refers
Expand Down
23 changes: 7 additions & 16 deletions llvm/include/llvm/IR/ModuleSummaryIndex.h
Original file line number Diff line number Diff line change
Expand Up @@ -1225,10 +1225,9 @@ using ModuleHash = std::array<uint32_t, 5>;
using const_gvsummary_iterator = GlobalValueSummaryMapTy::const_iterator;
using gvsummary_iterator = GlobalValueSummaryMapTy::iterator;

/// String table to hold/own module path strings, which additionally holds the
/// module ID assigned to each module during the plugin step, as well as a hash
/// String table to hold/own module path strings, as well as a hash
/// of the module. The StringMap makes a copy of and owns inserted strings.
using ModulePathStringTableTy = StringMap<std::pair<uint64_t, ModuleHash>>;
using ModulePathStringTableTy = StringMap<ModuleHash>;

/// Map of global value GUID to its summary, used to identify values defined in
/// a particular module, and provide efficient access to their summary.
Expand Down Expand Up @@ -1674,25 +1673,18 @@ class ModuleSummaryIndex {
bool PerModuleIndex = true) const;

/// Table of modules, containing module hash and id.
const StringMap<std::pair<uint64_t, ModuleHash>> &modulePaths() const {
const StringMap<ModuleHash> &modulePaths() const {
return ModulePathStringTable;
}

/// Table of modules, containing hash and id.
StringMap<std::pair<uint64_t, ModuleHash>> &modulePaths() {
return ModulePathStringTable;
}

/// Get the module ID recorded for the given module path.
uint64_t getModuleId(const StringRef ModPath) const {
return ModulePathStringTable.lookup(ModPath).first;
}
StringMap<ModuleHash> &modulePaths() { return ModulePathStringTable; }

/// Get the module SHA1 hash recorded for the given module path.
const ModuleHash &getModuleHash(const StringRef ModPath) const {
auto It = ModulePathStringTable.find(ModPath);
assert(It != ModulePathStringTable.end() && "Module not registered");
return It->second.second;
return It->second;
}

/// Convenience method for creating a promoted global name
Expand Down Expand Up @@ -1723,9 +1715,8 @@ class ModuleSummaryIndex {

/// Add a new module with the given \p Hash, mapped to the given \p
/// ModID, and return a reference to the module.
ModuleInfo *addModule(StringRef ModPath, uint64_t ModId,
ModuleHash Hash = ModuleHash{{0}}) {
return &*ModulePathStringTable.insert({ModPath, {ModId, Hash}}).first;
ModuleInfo *addModule(StringRef ModPath, ModuleHash Hash = ModuleHash{{0}}) {
return &*ModulePathStringTable.insert({ModPath, Hash}).first;
}

/// Return module entry for module with the given \p ModPath.
Expand Down
2 changes: 1 addition & 1 deletion llvm/lib/AsmParser/LLParser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8181,7 +8181,7 @@ bool LLParser::parseModuleEntry(unsigned ID) {
parseToken(lltok::rparen, "expected ')' here"))
return true;

auto ModuleEntry = Index->addModule(Path, ID, Hash);
auto ModuleEntry = Index->addModule(Path, Hash);
ModuleIdMap[ID] = ModuleEntry->first();

return false;
Expand Down
28 changes: 11 additions & 17 deletions llvm/lib/Bitcode/Reader/BitcodeReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -904,10 +904,6 @@ class ModuleSummaryIndexBitcodeReader : public BitcodeReaderBase {
/// path to the bitcode file.
StringRef ModulePath;

/// For per-module summary indexes, the unique numerical identifier given to
/// this module by the client.
unsigned ModuleId;

/// Callback to ask whether a symbol is the prevailing copy when invoked
/// during combined index building.
std::function<bool(GlobalValue::GUID)> IsPrevailing;
Expand All @@ -919,7 +915,7 @@ class ModuleSummaryIndexBitcodeReader : public BitcodeReaderBase {
public:
ModuleSummaryIndexBitcodeReader(
BitstreamCursor Stream, StringRef Strtab, ModuleSummaryIndex &TheIndex,
StringRef ModulePath, unsigned ModuleId,
StringRef ModulePath,
std::function<bool(GlobalValue::GUID)> IsPrevailing = nullptr);

Error parseModule();
Expand Down Expand Up @@ -6699,13 +6695,12 @@ std::vector<StructType *> BitcodeReader::getIdentifiedStructTypes() const {

ModuleSummaryIndexBitcodeReader::ModuleSummaryIndexBitcodeReader(
BitstreamCursor Cursor, StringRef Strtab, ModuleSummaryIndex &TheIndex,
StringRef ModulePath, unsigned ModuleId,
std::function<bool(GlobalValue::GUID)> IsPrevailing)
StringRef ModulePath, std::function<bool(GlobalValue::GUID)> IsPrevailing)
: BitcodeReaderBase(std::move(Cursor), Strtab), TheIndex(TheIndex),
ModulePath(ModulePath), ModuleId(ModuleId), IsPrevailing(IsPrevailing) {}
ModulePath(ModulePath), IsPrevailing(IsPrevailing) {}

void ModuleSummaryIndexBitcodeReader::addThisModule() {
TheIndex.addModule(ModulePath, ModuleId);
TheIndex.addModule(ModulePath);
}

ModuleSummaryIndex::ModuleInfo *
Expand Down Expand Up @@ -6936,7 +6931,7 @@ Error ModuleSummaryIndexBitcodeReader::parseModule() {
case bitc::MODULE_CODE_HASH: {
if (Record.size() != 5)
return error("Invalid hash length " + Twine(Record.size()).str());
auto &Hash = getThisModule()->second.second;
auto &Hash = getThisModule()->second;
int Pos = 0;
for (auto &Val : Record) {
assert(!(Val >> 32) && "Unexpected high bits set");
Expand Down Expand Up @@ -7697,7 +7692,7 @@ Error ModuleSummaryIndexBitcodeReader::parseModuleStringTable() {
if (convertToString(Record, 1, ModulePath))
return error("Invalid record");

LastSeenModule = TheIndex.addModule(ModulePath, ModuleId);
LastSeenModule = TheIndex.addModule(ModulePath);
ModuleIdMap[ModuleId] = LastSeenModule->first();

ModulePath.clear();
Expand All @@ -7712,7 +7707,7 @@ Error ModuleSummaryIndexBitcodeReader::parseModuleStringTable() {
int Pos = 0;
for (auto &Val : Record) {
assert(!(Val >> 32) && "Unexpected high bits set");
LastSeenModule->second.second[Pos++] = Val;
LastSeenModule->second[Pos++] = Val;
}
// Reset LastSeenModule to avoid overriding the hash unexpectedly.
LastSeenModule = nullptr;
Expand Down Expand Up @@ -7970,14 +7965,14 @@ BitcodeModule::getLazyModule(LLVMContext &Context, bool ShouldLazyLoadMetadata,
// module path used in the combined summary (e.g. when reading summaries for
// regular LTO modules).
Error BitcodeModule::readSummary(
ModuleSummaryIndex &CombinedIndex, StringRef ModulePath, uint64_t ModuleId,
ModuleSummaryIndex &CombinedIndex, StringRef ModulePath,
std::function<bool(GlobalValue::GUID)> IsPrevailing) {
BitstreamCursor Stream(Buffer);
if (Error JumpFailed = Stream.JumpToBit(ModuleBit))
return JumpFailed;

ModuleSummaryIndexBitcodeReader R(std::move(Stream), Strtab, CombinedIndex,
ModulePath, ModuleId, IsPrevailing);
ModulePath, IsPrevailing);
return R.parseModule();
}

Expand Down Expand Up @@ -8183,13 +8178,12 @@ Expected<std::string> llvm::getBitcodeProducerString(MemoryBufferRef Buffer) {
}

Error llvm::readModuleSummaryIndex(MemoryBufferRef Buffer,
ModuleSummaryIndex &CombinedIndex,
uint64_t ModuleId) {
ModuleSummaryIndex &CombinedIndex) {
Expected<BitcodeModule> BM = getSingleModule(Buffer);
if (!BM)
return BM.takeError();

return BM->readSummary(CombinedIndex, BM->getModuleIdentifier(), ModuleId);
return BM->readSummary(CombinedIndex, BM->getModuleIdentifier());
}

Expected<std::unique_ptr<ModuleSummaryIndex>>
Expand Down
77 changes: 46 additions & 31 deletions llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -431,6 +431,10 @@ class IndexBitcodeWriter : public BitcodeWriterBase {
/// Tracks the last value id recorded in the GUIDToValueMap.
unsigned GlobalValueId = 0;

/// Tracks the assignment of module paths in the module path string table to
/// an id assigned for use in summary references to the module path.
DenseMap<StringRef, uint64_t> ModuleIdMap;

public:
/// Constructs a IndexBitcodeWriter object for the given combined index,
/// writing to the provided \p Buffer. When writing a subset of the index
Expand Down Expand Up @@ -512,8 +516,16 @@ class IndexBitcodeWriter : public BitcodeWriterBase {
Callback(*MPI);
}
} else {
for (const auto &MPSE : Index.modulePaths())
Callback(MPSE);
// Since StringMap iteration order isn't guaranteed, order by path string
// first.
// FIXME: Make this a vector of StringMapEntry instead to avoid the later
// map lookup.
std::vector<StringRef> ModulePaths;
for (auto &[ModPath, _] : Index.modulePaths())
ModulePaths.push_back(ModPath);
llvm::sort(ModulePaths.begin(), ModulePaths.end());
for (auto &ModPath : ModulePaths)
Callback(*Index.modulePaths().find(ModPath));
}
}

Expand Down Expand Up @@ -3715,33 +3727,33 @@ void IndexBitcodeWriter::writeModStrings() {
unsigned AbbrevHash = Stream.EmitAbbrev(std::move(Abbv));

SmallVector<unsigned, 64> Vals;
forEachModule(
[&](const StringMapEntry<std::pair<uint64_t, ModuleHash>> &MPSE) {
StringRef Key = MPSE.getKey();
const auto &Value = MPSE.getValue();
StringEncoding Bits = getStringEncoding(Key);
unsigned AbbrevToUse = Abbrev8Bit;
if (Bits == SE_Char6)
AbbrevToUse = Abbrev6Bit;
else if (Bits == SE_Fixed7)
AbbrevToUse = Abbrev7Bit;

Vals.push_back(Value.first);
Vals.append(Key.begin(), Key.end());

// Emit the finished record.
Stream.EmitRecord(bitc::MST_CODE_ENTRY, Vals, AbbrevToUse);

// Emit an optional hash for the module now
const auto &Hash = Value.second;
if (llvm::any_of(Hash, [](uint32_t H) { return H; })) {
Vals.assign(Hash.begin(), Hash.end());
// Emit the hash record.
Stream.EmitRecord(bitc::MST_CODE_HASH, Vals, AbbrevHash);
}
forEachModule([&](const StringMapEntry<ModuleHash> &MPSE) {
StringRef Key = MPSE.getKey();
const auto &Hash = MPSE.getValue();
StringEncoding Bits = getStringEncoding(Key);
unsigned AbbrevToUse = Abbrev8Bit;
if (Bits == SE_Char6)
AbbrevToUse = Abbrev6Bit;
else if (Bits == SE_Fixed7)
AbbrevToUse = Abbrev7Bit;

Vals.clear();
});
auto ModuleId = ModuleIdMap.size();
ModuleIdMap[Key] = ModuleId;
Vals.push_back(ModuleId);
Vals.append(Key.begin(), Key.end());

// Emit the finished record.
Stream.EmitRecord(bitc::MST_CODE_ENTRY, Vals, AbbrevToUse);

// Emit an optional hash for the module now
if (llvm::any_of(Hash, [](uint32_t H) { return H; })) {
Vals.assign(Hash.begin(), Hash.end());
// Emit the hash record.
Stream.EmitRecord(bitc::MST_CODE_HASH, Vals, AbbrevHash);
}

Vals.clear();
});
Stream.ExitBlock();
}

Expand Down Expand Up @@ -4410,7 +4422,8 @@ void IndexBitcodeWriter::writeCombinedGlobalValueSummary() {

if (auto *VS = dyn_cast<GlobalVarSummary>(S)) {
NameVals.push_back(*ValueId);
NameVals.push_back(Index.getModuleId(VS->modulePath()));
assert(ModuleIdMap.count(VS->modulePath()));
NameVals.push_back(ModuleIdMap[VS->modulePath()]);
NameVals.push_back(getEncodedGVSummaryFlags(VS->flags()));
NameVals.push_back(getEncodedGVarFlags(VS->varflags()));
for (auto &RI : VS->refs()) {
Expand Down Expand Up @@ -4460,7 +4473,8 @@ void IndexBitcodeWriter::writeCombinedGlobalValueSummary() {
});

NameVals.push_back(*ValueId);
NameVals.push_back(Index.getModuleId(FS->modulePath()));
assert(ModuleIdMap.count(FS->modulePath()));
NameVals.push_back(ModuleIdMap[FS->modulePath()]);
NameVals.push_back(getEncodedGVSummaryFlags(FS->flags()));
NameVals.push_back(FS->instCount());
NameVals.push_back(getEncodedFFlags(FS->fflags()));
Expand Down Expand Up @@ -4520,7 +4534,8 @@ void IndexBitcodeWriter::writeCombinedGlobalValueSummary() {
auto AliasValueId = SummaryToValueIdMap[AS];
assert(AliasValueId);
NameVals.push_back(AliasValueId);
NameVals.push_back(Index.getModuleId(AS->modulePath()));
assert(ModuleIdMap.count(AS->modulePath()));
NameVals.push_back(ModuleIdMap[AS->modulePath()]);
NameVals.push_back(getEncodedGVSummaryFlags(AS->flags()));
auto AliaseeValueId = SummaryToValueIdMap[&AS->getAliasee()];
assert(AliaseeValueId);
Expand Down
22 changes: 11 additions & 11 deletions llvm/lib/IR/AsmWriter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1069,12 +1069,13 @@ int SlotTracker::processIndex() {

// The first block of slots are just the module ids, which start at 0 and are
// assigned consecutively. Since the StringMap iteration order isn't
// guaranteed, use a std::map to order by module ID before assigning slots.
std::map<uint64_t, StringRef> ModuleIdToPathMap;
for (auto &[ModPath, ModId] : TheIndex->modulePaths())
ModuleIdToPathMap[ModId.first] = ModPath;
for (auto &ModPair : ModuleIdToPathMap)
CreateModulePathSlot(ModPair.second);
// guaranteed, order by path string before assigning slots.
std::vector<StringRef> ModulePaths;
for (auto &[ModPath, _] : TheIndex->modulePaths())
ModulePaths.push_back(ModPath);
llvm::sort(ModulePaths.begin(), ModulePaths.end());
for (auto &ModPath : ModulePaths)
CreateModulePathSlot(ModPath);

// Start numbering the GUIDs after the module ids.
GUIDNext = ModulePathNext;
Expand Down Expand Up @@ -2890,12 +2891,11 @@ void AssemblyWriter::printModuleSummaryIndex() {
std::string RegularLTOModuleName =
ModuleSummaryIndex::getRegularLTOModuleName();
moduleVec.resize(TheIndex->modulePaths().size());
for (auto &[ModPath, ModId] : TheIndex->modulePaths())
for (auto &[ModPath, ModHash] : TheIndex->modulePaths())
moduleVec[Machine.getModulePathSlot(ModPath)] = std::make_pair(
// A module id of -1 is a special entry for a regular LTO module created
// during the thin link.
ModId.first == -1u ? RegularLTOModuleName : std::string(ModPath),
ModId.second);
// An empty module path is a special entry for a regular LTO module
// created during the thin link.
ModPath.empty() ? RegularLTOModuleName : std::string(ModPath), ModHash);

unsigned i = 0;
for (auto &ModPair : moduleVec) {
Expand Down
16 changes: 15 additions & 1 deletion llvm/lib/IR/ModuleSummaryIndex.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -554,6 +554,17 @@ void ModuleSummaryIndex::exportToDot(
std::map<StringRef, GVSOrderedMapTy> ModuleToDefinedGVS;
collectDefinedGVSummariesPerModule(ModuleToDefinedGVS);

// Assign an id to each module path for use in graph labels. Since the
// StringMap iteration order isn't guaranteed, order by path string before
// assigning ids.
std::vector<StringRef> ModulePaths;
for (auto &[ModPath, _] : modulePaths())
ModulePaths.push_back(ModPath);
llvm::sort(ModulePaths);
DenseMap<StringRef, uint64_t> ModuleIdMap;
for (auto &ModPath : ModulePaths)
ModuleIdMap.try_emplace(ModPath, ModuleIdMap.size());

// Get node identifier in form MXXX_<GUID>. The MXXX prefix is required,
// because we may have multiple linkonce functions summaries.
auto NodeId = [](uint64_t ModId, GlobalValue::GUID Id) {
Expand Down Expand Up @@ -589,7 +600,10 @@ void ModuleSummaryIndex::exportToDot(

OS << "digraph Summary {\n";
for (auto &ModIt : ModuleToDefinedGVS) {
auto ModId = getModuleId(ModIt.first);
// Will be empty for a just built per-module index, which doesn't setup a
// module paths table. In that case use 0 as the module id.
assert(ModuleIdMap.count(ModIt.first) || ModuleIdMap.empty());
auto ModId = ModuleIdMap.empty() ? 0 : ModuleIdMap[ModIt.first];
OS << " // Module: " << ModIt.first << "\n";
OS << " subgraph cluster_" << std::to_string(ModId) << " {\n";
OS << " style = filled;\n";
Expand Down
2 changes: 1 addition & 1 deletion llvm/lib/IRPrinter/IRPrintingPasses.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ PreservedAnalyses PrintModulePass::run(Module &M, ModuleAnalysisManager &AM) {
: nullptr;
if (Index) {
if (Index->modulePaths().empty())
Index->addModule("", 0);
Index->addModule("");
Index->print(OS);
}

Expand Down
Loading

0 comments on commit bbe8cd1

Please sign in to comment.