Skip to content

Latest commit

 

History

History
421 lines (261 loc) · 20.4 KB

P4Transfer.adoc

File metadata and controls

421 lines (261 loc) · 20.4 KB

P4Transfer - Helix Core Full History Migration Tool

1. Introduction

The script P4Transfer.py solves the problem: how to transfer changes between unrelated and unconnected Perforce Helix Core repositories.

If you have a 2015.1 or greater Helix Core server then you can use the Helix native DVCS commands (p4 clone/p4 fetch/p4 push). If you have a pre-2015.1 version of Helix Core (or very large repository sizes to be transferred), then P4Transfer.py may be your best option! See next section for guidance.

While best practice is usually to have only one Perforce Server, the reality is often that many different Perforce Servers are required. A typical example is the Perforce public depot, which sits outside the Perforce Network.

Sometimes you start a project on one server and then realize that it would be nice to replicate these changes to another server. For example, you could start out on a local server on a laptop while being on the road but would like to make the results available on the public server. You may also wish to consolidate various Perforce servers into one with all their history (e.g. after an acquisition of another company).

2. P4Transfer vs Helix native DVCS functions

Both Helix native DVCS and P4Transfer enable migration of detailed file history from a set of paths on one Helix Core server into another server.

Helix native DVCS has additional functionality, such as creating a personal micro-repo for personal use without a continuously running p4d process.

Important
When it is an option, using Helix native DVCS is preferred. But if that can’t be used for some reason, P4Transfer can pretty much always be made to work.
Tip
The new FetchTransfer.py script is a wrapper around the DVCS p4 fetch command with some advantages over P4Transfer.

2.1. Guidance on tool differences

Pros for native DVCS:

  • Helix native DVCS is a fully supported product feature.

  • Helix native DVCS requires consistency among the Helix Core servers: Same case sensitivity setting, same Unicode mode, and same P4D version (with some exceptions).

  • Helix native DVCS is easy to setup and use.

  • Helix native DVCS requires particular configurables to be setup, and may be viewed as a security concern for the extraction of history (this can be controlled with pre-command triggers)

  • Helix native DVCS has some limitations at larger scales (depends on server RAM as it creates zip files) - won’t scale to TB requiring to be transferred.

  • Helix native DVCS is generally deemed to produce the highest possible quality data on the target server, as it transfers raw journal data and archive content.

  • Helix native DVCS is very fast.

  • Helix native DVCS has no external dependencies.

  • Helix native DVCS bypasses submit triggers which can interfere with P4Transfer.

When P4Transfer might be a better option:

  • If Helix native DVCS hits data snags, there may not be an easy way to work around them, unless a new P4D server release address them.

  • Helix native DVCS must be explicitly enabled. If not already enabled, it requires 'super' access on both Helix Core servers involved in the process.

  • P4Transfer is community supported (and with paid Consulting engagements).

  • P4Transfer can make timestamps more accurate if it has 'admin' level access, but does not require it.

  • P4Transfer automates front-door workflows, i.e. it does would a human with a p4 command line client, fleet fingers and a lot of time could do.

  • P4Transfer can work with mismatched server data sets (different case sensitivity, Unicode mode, or P4D settings) so long as the actual data to be migrated doesn’t cause issues. (For example, if you try to copy paths that actually do have Korean glyphs in the path name to a non-Unicode server, that ain’t gonna work).

  • If P4Transfer does have issues with importing some data (like the above example with Unicode paths), you can manually work around those snags, and then pick up with the next changelist. If there aren’t too many snags, this trial and error process is viable.

  • P4Transfer is reasonably fast as a front-door mechanism.

  • P4Transfer can be run as a service for continuous operation, and has successfully run for months or more.

  • P4Transfer data quality is excellent in terms of detail (e.g. integration history, resolve options, etc.). However, it is an emulation of history rather than a raw replay of actually history in the native format as done by Helix native DVCS. The nuance in history differences rarely has practical implications.

  • P4Transfer requires more initial setup than Helix native DVCS.

  • P4Transer requires Python (2.7 or 3.6+).

  • P4Transfer can be interfered with by custom policy enforcement triggers on the target server.

  • P4Transfer is subject to submit triggers that may block its operation, possibly intentionally.

  • Helix native DVCS requires a direct connection betwen the related p4d servers, whereas P4Transfer requires only a connection from a client machine to each of the p4d servers. Thus, P4Transfer can operate as long as it operates in an environment where it can reach each of the target p4d servers, even of those p4d servers cannot communicate directly with each other, e.g. due to network firewalls.

3. Implementation

The basic idea is to create a client workspace on each Perforce Server that maps the projects to be transferred. Both client workspaces must share the same root directory and client side mapping. For example:

Source client:

Client: workspace_server1

Root: /work/transfer

View:
    //depot/myproject/dev/... //workspace_server1/depot/myproject/dev/...
    //depot/other/dev/... //workspace_server1/depot/other/dev/...

Target client:

Client: workspace_server2

Root: /work/transfer

View:
    //import/mycode/... //workspace_server2/depot/myproject/dev/...
    //import/stuff/... //workspace_server2/depot/other/dev/...

These client workspaces are created automatically from the view entries in the config file described below.

P4Transfer works uni-directionally. The tool will inquire the changes for the workspace files and compare these to a counter.

P4Transfer uses a single configuration file that contains the information of both servers as well as the current counter values. The tool maintains its state counter using a Perforce counter on the target server (thus requiring review privilege as well as write privilege – by default it assumes super user privilege is required since it updates changelist owners and date/time to the same as the source – this functionality is controlled by the config file).

3.1. Classic/local target depots vs Streams targets

It is now possible to migrate streams to streams. This is done by creating a source workspace which can read any specified stream (it is just a normal workspace since syncing from any stream is possible). The target workspace is a special stream workspace which is a mainline with no share …​ line in it, but which uses import+ to write to all the desired target streams.

Notes/restrictions on streams:

  • Currently supports streams→streams only (not classic to streams)

  • Takes whole streams rather than allowing filtering of source streams

  • Will auto-create target streams if necessary when you use wildcard matching ('*') in source/target stream names, e.g. allowing you to take all //streams_depot/release* streams across easily.

Important
Remember that humans should not be writing to the targets of a P4Transfer instance (streams or otherwise) - or the script may fail in strange ways!

4. Setup

You will need Python 2.7 or 3.6+ and P4Python 2017.2+ to make this script work.

The easiest way to install P4Python is probably using "pip" (or "pip3") – make sure this is installed. Then:

pip install p4python
Tip
If the above needs to build and fails, then this usually works for Python 3.6: pip3 install p4python==2017.2.1615960

Alternatively, refer to P4Python Docs

If you are on Windows, then look for an appropriate version on the Perforce ftp site (for your Python version), e.g. http://ftp.perforce.com/perforce/r20.1/bin.ntx64/

4.1. Installing P4Transfer.py

The easiest thing to do is to download this repo either by:

The minimum requirements are the modules P4Transfer.py and logutils.py

If you have installed P4Python as above, then check the requirements.txt for other modules to install via pip or pip3.

4.2. Getting started

Note that if running it on Windows, and especially if the source server has filenames containing say umlauts or other non-ASCII characters, then Python 2.7 is required currently due to the way Unicode is processed. Python 3.6+ on Mac/Unix should be fine with Unicode as long as you are using P4Python 2017.2+

Create the workspaces for both servers, ensuring that the root directories and client views match.

Now initialize the configuration file, by default called transfer.cfg. This can be generated by the script:

python3 P4Transfer.py --sample-config > transfer.yaml

Then edit the resulting file, paying attention to the comments.

The password stored in P4Passwd is optional if you do not want to rely on tickets. The tool performs a login if provided with a password, so it should work with security=3 or auth_check trigger set.

Note that although the workspaces are named the same for both servers in this example, they are completely different entities.

A typical run of the tool would produce the following output:

C:\work\> python3 P4Transfer.py -c transfer.yaml -r
2014-07-01 15:32:34,356:P4Transfer:INFO: Transferring 0 changes
2014-07-01 15:32:34,361:P4Transfer:INFO: Sleeping for 1 minutes

If there are any changes missing, they will be applied consecutively.

4.3. Script parameters

P4Transfer has various options – these are documented via the -h or --help parameters.

The following text may not display properly if your are viewing this P4Transfer.adoc file in GitHub. Please refer the the .pdf version instead, or open up help.txt file directly.

link:help.txt[role=include]

4.4. Optional Parameters

  • --notransfer - useful to validate your config file and that you can connect to source/target p4d servers, and report on how many changes might be transferred.

  • --maximum - useful to perform a test transfer of a single changelist when you get started (although remember this might be a changelist with a lot of files!)

  • --keywords - useful to avoid issues with expanding of keywords on a different server. Keywords make it hard to compare source/target results, and make transfers slower due to extra work required.

  • --end-datetime - useful to schedule a run of P4Transfer and have it stop at the desired time (e.g. run overnight and stop when users start work in the morning). Useful for scheduling long running transfers (can be many days) in quiet periods (e.g. together with Linux at command)

4.5. Long running jobs

On Linux, recommend you execute it as a background process, and then monitor the output:

nohup python3 P4Transfer.py -c transfer.yaml -r > out1 &

This will run in the background, and poll for new changes (according to poll_interval in the config file.)

You can look at the output file for progress, e.g. (avoiding long lines of text which can be output), or grep the log file:

tail -f out1 | cut -c -140
grep :INFO: log-P4Transfer-*.log

Note that if you edit the configuration file, it will be re-read the next time the script wakes up and polls for input. So you do not need to stop and restart the job.

4.6. Setting up environment

The following simple setup will allow you to cross check easily source and target servers. Assume we are in a directory: /some/path/p4transfer

export P4CONFIG=.p4config
mkdir source target
cd source
vi .p4config

and create appropriate values as per your config file for the source server e.g.:

cat .p4config
P4PORT=source-server:1666
P4USER=p4transfer
P4CLIENT=p4transfer_client

And similarly create a file in the target sub-directory.

This will allow you to quickly and easily cd between directories and be able to run commands against respective source and target p4d instances.

4.7. Convenience aliases

The following bash aliases may be useful:

alias glog='export lastlogfile=$(ls -tr log-P4Transfer*.log | tail -1)'
alias tlog='tail $lastlogfile | cut -c -400'
alias clog='cut -c -400 $lastlogfile | less'

They allow you to quickly update the latest log file and have a look at progress or errors. Note the cut command is used to avoid some hugely long lines which may be present in the log file.

4.8. Configuration Options

The comments in the file are mostly self-explanatory. It is important to specify the main values for the [source] and [target] sections.

P4Transfer.py --sample-config > transfer.yaml
cat transfer.yaml

The following included text may not display correctly when this .adoc file is viewed in GitHub - instead download the PDF version of this doc, or open transfer.yaml directly.

link:transfer.yaml[role=include]

4.8.1. Changelist comment formatting

In the [general] section, you can customize the change_description_format value to decide how transferred change descriptions are formatted.

Keywords in the format string are prefixed with $. Use \n for newlines. Keywords allowed are: $sourceDescription, $sourceChange, $sourcePort, $sourceUser.

Assume the source description is “Original change description”.

Default format:

$sourceDescription\n\nTransferred from p4://$sourcePort@$sourceChange

might produce:

Original change description
Transferred from p4://source-server:1667@2342

Custom format:

Originally $sourceChange by $sourceUser on $sourcePort\n$sourceDescription

might produce:

Originally 2342 by FBlogs on source-server:1667
Original change description

4.8.2. Recording a change list mapping file

There is an option in the configuration file to specify a change_map_file. If you set this option (default is blank), then P4Transfer will append rows to the specified CSV file showing the relationship between source and target changelists, and will automatically check that file in after every process.

change_map_file = depot/import/change_map.csv

The result change map file might look something like this:

$ head change_map.csv
sourceP4Port,sourceChangeNo,targetChangeNo
src-server:1666,1231,12244
src-server:1666,1232,12245
src-server:1666,1233,12246
src_server:1666,1234,12247
src-server:1666,1235,12248

It is very straight forward to use standard tools such as grep to search this file. Because it is checked in to the target server, you can also use “p4 grep”.

Important
You will need to ensure that the change_map filename is properly mapped in the local workspace - thus it must include <depot> and other pathname components in the path. When you have created your target workspace, run p4 client -o to check the view mapping.

5. Misc Usage Notes

Note that since labeling itself is not versioned no labels or tags are transferred.

5.1. Integration Records

Branching and integrating with is implemented, as long as both source and target are within the workspace view. Otherwise, the integrate action is downgraded to an add or edit.

5.2. Setting up as a service on Windows

P4Transfer can be setup as a service on Windows using srvinst.exe and srvanay.exe to wrap the Python interpreter, or NSSM - The Non-Sucking Service Manager

Please contact [email protected] for more details.

5.3. Comparing Repositories (and fixing any differences)

The helper script CompareRepos.py is useful to do occasional comparisons between source and target, and if required to "correct" or "fix" them.

nohup CompareRepos.py -c config.yaml -s //depot/src/...@1234 > compare.out &

If you want to open all files for the corrective action (e.g. add for missing, delete for extra, edit for modified), then add the --fix option:

nohup CompareRepos.py -c config.yaml -s //depot/src/...@1234 --fix > compare.out &

After that has completed, the files will be opened in the target client workspace from the config file, and you can manually submit them, e.g.

cd target
p4 opened | less                <==== review to see if as expected
nohup p4 submit -d "Fixup differences with //depot/src/...@1234"  > sub.out &

Make sure the submit has succeeded!

5.3.1. CompareRepos.py

link:compare_help.txt[role=include]

6. Support

Any errors in the script are highly likely to be due to some unusual integration history, which may have been done with an older version of the Perforce server.

If you have an error when running the script, please use summarise_log.sh to create a summary log file to send. E.g.

summarise_log.sh log-P4Transfer-20141208094716.log > sum.log

If you get an error message in the log file such as:

P4TLogicException: Replication failure: missing elements in target changelist: /work/p4transfer/main/applications/util/Utils.java

or

P4TLogicException: Replication failure: src/target content differences found: rev = 1 action = branch type = text depotFile = //depot/main/applications/util/Utils.java

Then please also send the following:

A Revision Graph screen shot from the source server showing the specified file around the changelist which is being replicated. If an integration is involved then it is important to show the source of the integration.

Filelog output for the file in the source Perforce repository, and filelog output for the source of the integrate being performed. e.g.

p4 -ztag filelog /work/p4transfer/main/applications/util/Utils.java@12412
p4 -ztag filelog /work/p4transfer/dev/applications/util/Utils.java@12412

where 12412 is the changelist number being replicated when the problem occurred.

6.1. Re-running P4Transfer after an error

When an error has been fixed, you can usually re-start P4Transfer from where it left off. If the error occurred when validating changelist say 4253 on the target (which was say 12412 on the source) but found to be incorrect, the process is:

p4 -p target-p4:1666 -u transfer_user -c transfer_workspace obliterate //transfer_workspace/...@4253,4253
(re-run the above with the -y flag to actually perform the obliterate)

Ensure that the counter specified in your config file is set to a value less than 4253 such as the changelist immediately prior to that changelist. Then re-run P4Transfer as previously.

7. Contributor’s Guide

Pull Requests are welcome. Code changes should normally be accompanied by tests.

See TestP4Transfer.py for unit/integration tests.

Most tests generate a new p4d repository with source/target servers and run test transfers.

The use the "rsh" hack to avoid having to spawn p4d on a port.

7.1. Test dependencies

Tests assume there is a valid p4d in your current PATH.

7.2. Running a single test

Pick your single test class (e.g. testAdd):

python3 TestP4Transfer.py TestP4Transfer.testAdd

This will:

  • generate a single log file: log-TestP4Transfer-*.log

  • create a test sub-directory _testrun_transfer with the following structure:

    source/
        server/         # P4ROOT and other files for server - uses rsh hack for p4d
        client/         # Root of client used to checkin files
        .p4config       # Defines P4PORT for source server
    target/             # Similar structure to source/
    transfer_client/    # The root of shared transfer client

This test directory is created new for each test, and then left behind in case of test failures. If you want to manually do tests or view results, then export P4CONFIG=.p4config, and cd into the source/target directory to be able to run normal p4 commands as appropriate.

7.3. Running all tests

python3 TestP4Transfer.py

It will generate many log files (log-TestP4Transfer-*.log) which can be examined in case of failure or removed.