Wednesday, August 22, 2012

Are you ready yet?

We had the privilege of participating in Are You Ready? 2012 organized by the Rotaract Club of University of Moratuwa. ( It was a huge success because we were able to find some great people who fit in to our requirements.

I thought I should write a few points here that might be useful to the fresh graduates

Being from UOM myself, and associated with many mora graduates at different levels in the cooperate world, I have no doubt about the capabilities and competencies of UOM products. Yet it's sad to see some candidates perform poorly at interviews merely because they haven't paid any attention to some simple facts.

Interviews need practice. Make sure that your first interview is not with the actual place you want to join. The opportunities to participate in interviews are so many and they are FREE! Career days are wonderful for you to get this learning experience.

Specially in a career day like this, we don't get to read your CV word by word. Even at office this is usually the case. People have better things to do than reading hundreds of boring Cvs. We'd usually scan through. We'd pay attention to some areas, like projects and internship. Be mindful of that when you prepare the CV. Highlight only the important points and keep the CV short and sweet.

If there's one good advice that can be given to a fresh candidates, it's to think like an interviewer. See yourself in the other chair and imagine you're listening to the answers. It can make a world of difference to the way you present yourself.

The first thing you get to speak about in at least 95% of the interviews is describing yourself. This is not your grade 3 English class period where you get to speak 2 minutes about “Myself” No! But it's hard to believe how many people actually go on “I'm Sunil Gamage, I'm from Panadura. My father is a government officer, my mother is a housewife, I have 2 brothers and 1 sister .... bluh bluh bluh” and sometimes if we don't intertwine and stop they'd go on describing pets, hobbies and what their neighbors look like!
Remember, we're interested in hiring YOU! Not your parents or your siblings and it doesn't matter if your father is a farmer, a government or the president. It doesn't matter from where in the island you're coming from.

Here is your opportunity to paint the image about yourself as you want it in the mind of the interviewer, your opportunity to show what a great hire you can be, how confident you are and how passionate you are about work.
These few golden minutes are just too important to waste.

Next, since you don't have work experience yet, there are few things any interviewer will ask about.
  1. The final year project
  2. Other projects
  3. Internship period
All interviewers, specially those who represent IT companies, love to see a lot of projects in your CV. Specially the ones that you did outside the mandatory requirements of your degree. One candidate who had a poor GPA etc, made it to the next level simply because of the Google Summer of Code projects he did out of his own interest.

You may make a few grammar mistakes, a few pronunciation mistakes and that's ok. We do not, i repeat, do not hire you for your English language proficiency, but we DO hire you for your communication skills. What is the difference? When you work in a company you do not work alone, you have to deal with people at different levels. It's important that you're able to convey an idea, a message, clearly and effectively. When you talk about your projects, if we understand, that means you have communicated well. But if you have left us scratching our heads and wondering what you have done, that means you haven't communicated well. Can you communicate well in broken English? Sure you do!

No company has employees only from UOM and only those who have got 3As for their A/L. That means we get Cvs of people with different levels of qualifications. Among them, your CV is sure to shine. You already have a lot at your side. So why be nervous? Be confident, take a deep breath and feel good about yourself. Don't let some questions, to which you don't know the answer, make you feel stupid. Nobody in the world knows everything about even a single subject. If you don't know, fine, admit it and may be try to make an intelligent guess and tell that it's guess. Do not expect interviewers to feel confident about you, if you are not showing any confidence yourself.

Chances are that you are smarter than your interviewers, just like back in school you were smarter than most of your teachers. You were smarter but your teachers taught you. Why? For the simple reason that they knew better about the subject than you did. So is the case with interviews, they know better about the industry, the company and the requirements they have. Be smart and confident but don't be arrogant and give airs to your interviewers. Nobody wants to work with people who believe they are know-alls.

Finally, Google and find some articles and books on facing interviews. At least spend the time you'd spend on an assignment to prepare for your interviews. Think about it. You may get a few extra marks for the assignment and forget about it, but an interview will determine what you do and how much you get in terms of experience and of-course money for at least a few years.
Isn't it much more critical?

Monday, August 13, 2012

SVN Repository Migration

I work for a product based IT company. They have started using svn in 2004 and have maintained a single repository for everything, for projects for docs for private experiments for dogs and for cats! The repository has grown and grown and grown as a well fed tree and has reached to be around 275 GB in 2012.

The back up process was taking around 6 hours and a senior manager was threatening to get rid of all the history and to export the content to a new repository.

Now as software engineers we know it's not a very good thing to do. How many times do we take our shovels and put on our overalls and dig deep into the mystic history of svn in search of skeletons and criminals. No we NEED that history!

I, the poor little new joinee who never had anything to do with any kind of repository maintenance, was given the mammoth task of finding a solution, and finding it quick. Break that big evil repo into pieces I was told.

What kind of hammers can I use for this? As the first step I downloaded the svn book 1.7, a document freely available on internet which is compiled by the svn crew. And to put a long story short, here are the options I considered; in a nutshell.

Investigated Options

Copy & Delete 
Why not copy the entire repository and delete unwanted projects along with their history. That should be simple enough. There is a little problem though. SVN doesn't provide an option to delete a part of repository with history. You simply can't get rid of your past in svn. If you really really want to do it, svn makes you work real hard for it. Well... Isn't there a command called svn delete? Yes but it will only mark the files as deleted but it retains everything and in fact will increase the repo size. Conclusion: Cannot be used.

svn export
This command is used to export a clean directory tree with no history and no meta data. (those pesky little .svn files that like to hide) Not a good option from developers' point of view. The whole point of this exercise is to save the history. But this can be used for parts that do not require history of course.

This is the Subversion remote repository mirroring tool. It allows to mirror a svn and keep it up-todate by syncing the original with the mirror from time to time. svnsync does the magic by replaying the revisions of one repository into another one. And the good news is yes it can be used to break a big repo to several smaller repos by creating mirrors of sub trees. Tests have shown svnrdump, which does things in a very similar way, has better performance. But you may get some nasty errors due to property validations. Take a look at the property validation topic for further details. There is no option to skip validations. Manual fixing of errors is possible, yet not a feasible approach considering the magnitude of the task.

svnadmin dump/load
This command is used to dump the contents of the file system and load it into a new repository. But it cannot dump a sub tree. It's the entire thing or nothing. It can't help us alone, but we can use it with something else called svndumpfilter.

svnrdump dump/load
This is a shining new feature available in SVN 1.7. It proudly announces that it can be used for Remote Repository Data Migration. In simple terms that means even if you're not admin and even if you're not logged into svn server, you can create a dump remotely. It's the cousin of svnadmin dump and a more flexible one. You can dump a sub-tree of the repository through this—something svnadmin dump cannot do. Same property validation errors occur as in svnsync when loading. But the good news is that svnadmin load can skip validations and skipping validation does no harm.

Ladies and gentlemen, let me introduce you to the hero of the day! (Or should I say the hero who saved my day).... svndumpfilter! It is a utility for removing history from a Subversion dump file by either excluding or including paths beginning with one or more named prefixes. As per the specs it can operate on any dump file and filter it and give a dump that has only desired content.
There's a small glitch though. We know 2 ways of creating svn dumps; svnadmin dump and svnrdump. As of svn 1.7, svnadmin dump creates a version 2 dump file as the default dump file type. You can specify it to be of version 3 if you want. As the new kid in the block, svnrdump creates only dumps of type version 3. svndumpfilter, being old fashioned, doesn't like to deal with new type of dumps. (It's a known bug) So we must give it a version 2, a dump created by svnadmin dump. Can migrate multiple projects together. It's not possible to use both include and exclude together, but you can always do them one after another. For example include some stuff and get a dump and then filter on the resulting dump to exclude things out of it.

Feasible Solutions

After analyzing all the above, following 2 were identified as the feasible Solutions. Multiple projects can be migrated to the same repository using both options.
  • Option1 :svnrdump dump & svnadmin load Destination repository need to be in SVN 1.7. (Remember svnrdump is new) Since it's a new feature there can be bugs. (After all we're all developers) 
  • Option2 :svnadmin dump & svndumpfilter This has been time tested and offer some neat options that are handy.

Revision Numbers

This is something lot of people liked to keep. Bugzilla matched bugs with revision numbers. Release documents contained so many references to revision numbers. X was born in revision 2563 and x fell in love with y in revision 5845 and they got married in revision 6541 and had their first kid in revision 5425 and the story continued for generations. Now we can't blame anyone for being sentimental about revision numbers, can we? svndumpfilter gives some nice options regarding revisions. You can drop empty revisions and renumber the remaining ones. Or you can keep the original ones as it is. So we thought it was easy. It just happened that I had to create some directories for the new repository before I loaded my dump to it. Repository creation was revision 0 and creating these directories was revision 1. And when it started loading, all my revision numbers were getting incremented by one! 

<<< Started new transaction, based on original revision 1
------- Committed new rev 2 (loaded from original rev 1)
>>> <<< Started new transaction, based on original revision 2 ------- Committed new rev 3 (loaded from original rev 2)
Well, me and my supervisor wondered, that can be forgiven, it's just one. We can just tell people to look at the x-1 revision. And the suddenly a light bulb flashed! May be, just may be, if we dump from second revision and load it again ???? And yes, to our delight it worked.
svnadmin dump /svn/tempRepo -r 2:HEAD | svnadmin load --bypass-prop-validation /svn/destinationRepo
Option1 :svnrdump dump & svnadmin load
 Revision numbers are identical in source and destination
Option2 :svnadmin dump & svndumpfilter
 Can keep original revision numbers
 If filtering causes any revision to be empty, can remove these revisions from the dump.
 Can renumber revisions that remain after filtering. We decided to go with option 2 for the sake of revision numbers.

Dependency Resolution

Creating a dump sounds as a very easy thing to do, just run the command and wait, isn't it. Yes that can be the case if the developers at your company project teams were enemies and have sworn never to touch the other project's code. But in reality, people think svn is such a cool tool and use it to copy stuff, move stuff from completely random places to another set of completely random places. What they don't know is that svn watches their every move and records them. Say you copy something like /projectA/x/y/ to somewhere in your projectB, you have created a dependency and if you want to filter project B you have to include that copy path too.

Handling Binaries

Svn saves stuff in a delta based algorithm. As svn book puts it neatly, To keep the repository small, Subversion uses deltification (or delta-based storage) within the repository itself. Deltification involves encoding the representation of a chunk of data as a collection of differences against some other chunk of data. If the two pieces of data are very similar, this deltification results in storage savings for the deltified chunk—rather than taking up space equal to the size of the original data, it takes up only enough space to say, “I look just like this other piece of data over here, except for the following couple of changes.” The result is that most of the repository data that tends to be bulky—namely, the contents of versioned files—is stored at a much smaller size than the original full-text representation of that data. But things get ugly with binaries. All our jars, docs, pdfs fall into this category. They cannot be diffed. So when your tech writer lady corrects a typo and checks in a huge document, svn simply saves another copy of the document as a new revision. In our repository there was a folder to which they checked in jars. Excluding this folder reduced the repository size by more than 50%. We did things in a smart way and didn't try to add it to a svn again. It was decided to maintain it in a normal directory. Who checked in what, and when didn't matter for these jars which are third party tools. IT folks promised to enforce necessary permissions so that these won't be deleted by anyone. And also there was a folder with documents owned by the tech writers. This part of repository was twice as big as the space took by company's biggest project. We were anyway keeping the old repo as read only and our dear tech writers agreed to kiss the history goodbye for docs folder. Docs were given a fresh start in life as they were simply checked in as new content to the new repos.
Loading multiple projects to the same repository
Specify the parent directory with –parent-dir. Else it will be loaded to the root.
svnadmin load --bypass-prop-validation /svn/destinationRepo --parent-dir A < A.dump
svnadmin load --bypass-prop-validation /svn/destinationRepo --parent-dir B < B.dump

Roadblocks You May Encounter


svnadmin load may fail giving the following error. svnadmin: E125005: Invalid property value found in dumpstream; consider repairing the source or using --bypass-prop-validation while loading. svnadmin: E125005: Cannot accept 'svn:log' property because it is not encoded in UTF-8 Same error is reported as below for svnsync Committed revision 35670. Copied properties for revision 35670. svnsync: At least one property change failed; repository is unchanged svnsync: Error setting property 'log': Could not execute PROPPATCH. This error is due to non-UTF8 encodings are not supported in svn logs. As svn book explains Newer versions of Subversion have grown more strict regarding the format of the values of Subversion's own built-in properties. Of course, properties created with older versions of Subversion wouldn't have benefited from that strictness, and as such might be improperly formatted. Dump streams carry property values as-is, so using Subversion 1.7 to load dump streams created from repositories with ill-formatted property values will, by default, trigger a validation error. There are several workaround for this problem. First, you can manually repair the problematic property values in the source repository and recreate the dump stream. Or, you can manually tweak the dump stream itself to fix those property values. Finally, if you'd rather not deal with the problem right now, use the --bypass-prop-validation option with svnadmin load. One solution is to manually update the logs that are not encoded in UTF 8.
svn proplist -v --revprop -r 35670 | iconv --to-code UTF8//IGNORE -o /tmp/iconv.out
svn propset svn:log --revprop -r 35670 -F /tmp/iconv.out 

For my scenario best solution was to by pass the property validations in svnadmin load. What I needed to do was migrate the content as it is to the new repositories. Fixing somebody else's dirty work was not in my specs!
svnadmin load --bypass-prop-validation /svn/destinationRepo < sourceDump.dump

When tried to use dumpfilter on a dump that was created by svnrdump following error was encountered.
svnrdump dump | svndumpfilter include /A > A.dump svndumpfilter: E140001: Unsupported dumpfile version: 3
svnadmin dump which creates a dump from a local repository, creates a dump with the default format 'format 2'. svnrdump which creates a dump from a remote repository creates 'format 3' dump files only. svndumpfilter supports only 'format 2' and not 'format 3'

Invalid copy source path
svndumpfilter include /projects/A < fullRepo.dump > A.dump svndumpfilter: Invalid copy source path '/projects/B/xyz' Say you want to move your project A to a different repository. You proudly say that your project is independent and you can survive alone. But then when you put the filter to work to create your dump you get this error. One of your smart developers had seen that project B had just what he wanted and decided to steal their code. He had done a svn copy from /projects/B/xyz to projects/A and now svn says unless you give it that path too it will never create the dump. Svn can be a one tough kid. What you can do is simply give it what it asks for. I call it resolving dependencies. And for a big project their can be quite a number of dependencies. svndumpfilter include /projects/A /projects/B/xyz< fullRepo.dump > A.dump Sometimes you may be forced to add content that can must not be in that repository. In that case you can do a svn delete and delete it from head once the loading is complete.

Say you create a dump like this

svndumpfilter include /projects/A /projects/B/xyz /projects/C/abc/efg < fullRepo.dump > A.dump 

And try to load it
svnadmin load --bypass-prop-validation /svn/destinationRepo < A.dump 

And it gives an error, svnadmin: E160013: File not found: transaction '5104-3xs', path 'project/B' This occurs when SVN is unable to figure out certain paths. Manually creating the path fix the issue. What you need to create are the intermediary directories that are there in the include. For example what is there in bold in the following

svndumpfilter include /projects/A /projects/B/xyz /projects/C/abc/efg < fullRepo.dump > A.dump
svnadmin create /svn/destinationRepo
svn mkdir "create folders" \
file:///svn/destinationRepo/projects \
file:///svn/destinationRepo/projects/A \
file:///svn/destinationRepo/projects/C \ file:///svn/destinationRepo/projects/C/abc
svnadmin load --bypass-prop-validation /svn/destinationRepo < A.dump 

Do the making of directories in one shot using --parents option.

svn mkdir -m "create folders" --parents \ file:///svn/destinationRepo/projects/A \ file:///svn/destinationRepo/projects/C/abc 

 This error can also occur due to missing dependencies that didn't cause issues with the filtering. If you get this error for a file, take the svn log for that file and see what has happened in that particular revision the error occurs. For example I got this error due to a folder rename and had to include the earlier path name and create a fresh dump.

After the migration

Once the migration is complete compare the svn logs of source and destination. Getting the logs in xml formats is good if the source and destination are in two different svn versions, since with version the format of logs may differ. A merge too can be used to compare the two logs in xml format. Viewsvn can also be used to verify the contents after migration. Once the migration is done, users' working copies have to be pointed to the new repositories. What we did was simply ask the users to get fresh check outs. Following can be useful for those who don't like to do that.

1. svn relocate: Relocate the working copy to point to a different repository root URL. This “rewrites” the working copy's administrative metadata to refer to the new repository location. But, it wants to compare the UUID of the repository against what is stored in the working copy. If UUIDs don't match, the working copy relocation is disallowed. We have two ways of keeping the UUID of the source. Please note that this will make both repos to have the same UUID. 1. svnadmin load  has following option --force-uuid By default, when loading data into a repository that already contains revisions, svnadmin will ignore the UUID from the dump stream. This option will cause the repository's UUID to be set to the UUID from the stream. 2. svnadmin setuuid — Reset the repository UUID. Reset the repository UUID for the repository located at REPOS_PATH. If NEW_UUID is provided, use that as the new repository UUID; otherwise, generate a brand-new UUID for the repository.

2. svn upgrade — Upgrade the metadata storage format for a working copy. This will be needed for the users to upgrade to svn 1.7 As new versions of Subversion are released, the format used for the working copy metadata changes to accomodate new features or fix bugs. Older versions of Subversion would automatically upgrade working copies to the new format the first time the working copy was used by the new version of the software. Beginning with Subversion 1.7, working copy upgrades must be explicitly performed at the user's request. svn upgrade is the subcommand used to trigger that upgrade process. If you attempt to use Subversion 1.7 on a working copy created with an older version of Subversion, you will see an error.

Summarized Commands

At the source
svnadmin dump $SOURCE_REPO | svndumpfilter include \
/component \
/docs \
/project/A \
/project/B \
/project/C/applications/journal \ >
svndumpfilter exclude /docs/userguides/custom/ < $DUMPDIR/A1.dump > $DUMPDIR/A.dump

At the destination
rm -rf $REPO_LOCATION/tempRepo
svnadmin create $REPO_LOCATION/tempRepo
svn mkdir -m "create initial folders" --parents \ file://$REPO_LOCATION/tempRepo/project/C/applications
svnadmin load --bypass-prop-validation $REPO_LOCATION/tempRepo < $DUMPDIR/A.dump
rm -rf $REPO_LOCATION/destinationRepo
svnadmin create $REPO_LOCATION/destinationRepo
svnadmin dump $REPO_LOCATION/ tempRepo -r 2:HEAD | svnadmin load --bypass-prop-validation $REPO_LOCATION/destinationRepo

svn delete -m "delete unwanted content" file://$REPO_LOCATION/destinationRepo/project/C